1 Introduction

The emergence of large-scale quantum computing devices would have enormous consequences in physics, mathematics and computer science.

While quantum hegemony has yet to be achieved by these machines, the field of post-quantum cryptography has become very active in the last twenty years, as it is of foremost importance to achieve today security against possible adversaries from tomorrow. As a consequence, post-quantum asymmetric primitives are being developed and standardized, to protect public-key cryptography against the ravages of Shor’s period-finding algorithm ([49]), that provides an exponential advantage to a quantum adversary compared to all known classical factorization algorithms.

Symmetric Cryptography in the Quantum World. In the symmetric setting, Grover’s algorithm can speed up quadratically the classical exhaustive key search. As a result, ideal ciphers with k-bit keys would provide only k / 2-bit security in a post-quantum world. The confidence we have on real symmetric primitives is based on cryptanalysis, i.e. the more we analyze a primitive without finding any weakness, the more trust we can put in it. Until recently, little was known on how quantum adversaries could try to attack symmetric primitives. Therefore, as little was known about their security and confidence they should inspire in a quantum world.

This is why turning classical attacks into quantum attacks was studied in [34, 36]. By transposing the weaknesses of an encryption function to the post-quantum world, it is indeed possible to improve on the naive, all-purpose Grover search. How classical attacks can be “quantized” requires, however, a careful analysis.

Besides, if the adversary has stronger capacities than the mere access to a quantum computing device (e.g., if she can ask superposition chosen-plaintext queries), an exponential speedup has been shown to occur for some constructions. This was first noted by Kuwanado and Morii against the Even-Mansour construction ([42]) and the three-round Feistel ([41]), later extended to slide attacks and modes of operation for MACs ([35]). All these attacks use Simon’s algorithm [50].

Quantum Collision and Multi-target Preimage Search. In a classical setting, it is well known that finding a collision for a random function H on n bits, i.e. a pair xy with \(x \ne y\) such that \(H(x) = H(y)\), costs \(O\left( 2^{n/2}\right) \) in time and queries [48]. A parallelization of this algorithm was proposed in [47], that has a product of time and memory complexities of also \(O\left( 2^{n/2}\right) \).

In a quantum setting, an algorithm was presented by Brassard, Høyer and Tapp in [20] that, given superposition query access to a (2-to-1) function H on n bits, outputs a collision using \(\widetilde{O}\left( 2^{n / 3}\right) \) superposition queries to H. This algorithm is optimal, but only in terms of query complexity, while its product of time and memory complexities is as high as \(\widetilde{O}\left( 2^{2n / 3}\right) \) and makes it non-competitive when compared to the classical attack.

Regarding the multi-target preimage search, i.e. given t values of \(H(x_i)\) for i from 1 to t, find one out of the t values of the \(x_i\), the best classical algorithm finds a preimage in time \(O\left( 2^{n\,-\,t}\right) \). In the quantum setting, the best algorithm takes time \(\widetilde{O}(2^{n / 2})\), it consists in fact of finding the preimage of a single chosen target with Grover’s algorithm.

1.1 Contributions

The contributions we present in this paper are two folded:

Improved Algorithms for Collision and Multi-target Preimage Search. First, we propose a new quantum algorithm for collision search, based on amplitude amplification, which runs in real time \(\widetilde{O} \left( 2^{2n/5}\right) \) with a single quantum processor, uses O(n) qubits of memory, and \(\widetilde{O}\left( 2^{n/5}\right) \) bits of classical storage, accessed via a classical processor. The algorithm can be adapted to solve the multi-target preimage problem, with a running time \(\widetilde{O}\left( 2^{3n/7}\right) \), the same quantum requirements and \(\widetilde{O}\left( 2^{n/7}\right) \) bits of classical storage.

We also extend these results if quantum parallelization is allowed. These quantum algorithms are the first ones to significantly improve on the best classical algorithms for those two problems. These results also solve an open problem and contradict a conjecture on the complexity of quantum collision and multi-target preimage search, as we will detail in Sect. 7.2.

Implications of these Algorithms. We have studied the impact of these new algorithms on several cryptographic settings, and obtained the following conclusions:

  • Hash functions: We are able to improve the best known collision and multi-target preimage attacks when the attacker has only access to a quantum computer.

  • Multi-user setting: We are able to improve the best known attacks in a multi-user setting, i.e. recover one key out of many, thanks to the multi-target preimage search improved algorithm. The model for the attacker here is also not very strong, and we only suppose she has access to a quantum computer.

  • Operation Modes: Considering collision attacks on operation modes, we are able to improve them with our new algorithms. In this case, the attacker is placed in a more powerful model: she can make superposition queries to a quantum cryptographic oracle. The question of a new data limit enforcement is raised.

  • Bricks for cryptanalysis techniques: we show how these algorithms can be used as building blocks in more complex cryptanalytic techniques.

We also discuss the implications of these attacks with respect to security bounds for symmetric cryptographic primitives.

Organisation. In the next section, we detail some security notions that will be considered in our applications and some basic notions of quantum computing. In Sect. 3, we present the considered problems: collision and multi-target preimage search, and we report the state-of-the-art of the previous best known quantum and classical algorithms for solving them. We also present some cryptographic scenarios where these problems are shown to be useful. In Sect. 4, we develop a prerequisite building block of our algorithms, while Sect. 5 is dedicated to detail the new algorithms and possible trade-offs. In Sect. 6 we analyze the impact of these algorithms with respect to the cryptographic scenarios previously presented. A discussion on our results and a conclusion are provided in Sect. 7. In the auxiliary supporting material appended to this submission, we deal with the algorithmic imperfections of the amplitude amplification algorithm.

2 Preliminaries

This section describes some concepts that will be needed for presenting our results: we first provide some classical security notions. Next we describe the two models most commonly considered for quantum adversaries, as both will be considered in applications (Sect. 6). Finally, we will briefly describe the basic quantum computing notions that we will need in order to explain our new algorithms in Sect. 5.

2.1 Some Classical Security Notions

In this section we briefly describe some notions from symmetric key cryptography that will be used through the paper.

Key Recovery Attack. Consider a cipher \(E_K\), that is a pseudo-random permutation parameterized by a secret key K of size k. This cipher takes as input a plaintext of size n and generates a ciphertext of the same size. In the common known-plaintext setting (KPA), the attacker gets access to some pairs of plaintexts and ciphertexts \((P_i,C_i)\). Sometimes the attacker is also allowed to choose the plaintexts: this is called chosen-plaintext attacks (CPA).

It is always possible to perform an exhaustive search on the key and to find the correct one as the one that verifies \(C_i=E_K(P_i)\) for all i. The cost of this is \(2^{k}\) encryptions, and this is the security an ideal cipher provides: the best attack is the generic attack. Therefore, a cipher is broken if we are able to recover its key with a complexity smaller than \(2^k\). The data complexity will be the number of calls to the cryptographic oracle \(E_K\), i.e. the number of pairs \((P_i,C_i)\) needed to successfully perform the attack; the time complexity is the overall time needed to recover the key, and the memory complexity is the amount of memory needed to perform the attack.

Distinguisher and Plaintext Recovery. Key recovery attacks are the strongest, but being able to distinguish the generated ciphertexts from random values is also considered as a weakness. Moreover, when the attacker only captures ciphertexts, she shouldn’t able to recover information of any kind on the corresponding plaintexts.

Modes of Operations. In order to be able to treat messages of different lengths and to provide specific security properties, as confidentiality and integrity, block ciphers are typically used in operation modes. One of the security properties that these modes should offer is, for instance, not to allow an attacker to identify when the same two blocks have been encrypted under the same key, without having to change the key for each block (which wouldn’t be very efficient). Some popular modes are Cipher Block Chaining, CBC [26], or Counter Mode, CTR [25]. It is also possible to build authenticated encryption primitives by using authentication modes, as the Offset Codebook Mode, OCB [39] proposed by Krovetz and Rogaway. Their securities have been widely studied in the classical setting ([8]), as well as recently in a post-quantum setting ([5]).

A plaintext m is split in blocks \(m_0 \ldots m_{l\,-\,1}\), that will be encrypted with the help of the cipher \(E_K\) and combined; the ciphertext is \(c = c_0 \ldots c_{l\,-\,1}.\)

CBC. The Code Block Chaining (CBC) mode of operation defines the ciphertext blocks as follows: \(c_0 = E_K(m_0 \oplus IV)\) and for all \(i \le l-1\):

$$c_i = E_K(m_i \oplus c_{i\,-\,1})$$

where IV is a public initial value.

The block size being n, some restrictions on the maximal number of blocks encrypted under the same key must be enforced. Indeed, the birthday paradox implies that after recovering \(2^{n/2}\) encrypted blocks, there is a high (and constant) probability that two of them are equal, leading to:

$$E_K(m_i \oplus c_{i\,-\,1}) = E_K(m_j \oplus c_{j\,-\,1}) .$$

And since \(E_K\) is a permutation, we get \(m_i \oplus c_{i\,-\,1} = m_j \oplus c_{j\,-\,1}\) hence \(m_i \oplus m_j\), the XOR of two plaintext blocks, from the knowledge of the ciphertexts.

CTR. In the counter mode (CTR), blocks \(m_i\) are encrypted as \(c_i = E_K(IV \oplus i) \oplus m_i\) where IV is an initial public value, and i is a counter initialized to zero. As all the inputs of the encryption function are different, we won’t have collisions due to the birthday paradox as in the CBC case, but this lack of collisions can exploited to distinguish the construction if more than the \(2^{n/2}\) recommended bound of data was generated with the same key.

2.2 Quantum Adversary Models

In this section we describe and justify the two models most commonly considered for quantum adversaries. The application scenarios described in Sect. 6 will use both of them.

Model \(Q_1\). The adversary has access to a quantum computer: this is the case, for instance, in [15, 18, 51, 54]. The adversary can query a quantum random oracle with arbitrary superpositions of the inputs, but is only able to make classical queries to a classical encryption oracle (and therefore no quantum superposition queries to the cryptographic oracle).

Model \(Q_2\). In this case, the adversary is allowed to perform quantum superposition queries to a remote quantum cryptographic oracle (qCPA): she obtains a superposition of the outputs. This model has been considered for instance in [15, 16, 23, 29, 35, 52]. This is a strong model, but it has the advantages of being simple, inclusive of any possible intermediate and more realistic model, and achievable. In particular, in several of these publications, secure constructions were provided for this scenario.

2.3 Quantum Computing

In this section we provide some basic notions from quantum computing that will be used through the paper. The interested reader can see [46] for a detailed introduction to quantum computing.

Quantum Oracles for Functions. Any function \(f~: \{0,1\}^n \rightarrow \{0,1\}^m\) with a known circuit description can be efficiently implemented as a quantum unitary \(O_f\) on \(n + m\) qubits, with:

$$O_f~: \left| x \right\rangle \left| y \right\rangle \mapsto \left| x \right\rangle \left| y \oplus f(x) \right\rangle .$$

The quantum running time of \(O_f\) is twiceFootnote 1 the running time of f.

Projection Oracle. Let P a projector acting on n qubits. We define \(O_P\) as the following unitary acting on \(n+1\) qubits

$$ O_P (\left| \psi \right\rangle \left| b \right\rangle ) := \left\{ \begin{array}{ll} \left| \psi \right\rangle \left| b \oplus 1 \right\rangle &{} \text { if } \left| \psi \right\rangle \in Im(P) \\ \left| \psi \right\rangle \left| b \right\rangle &{} \text { if } \left| \psi \right\rangle \in Ker(P) \end{array} \right. . $$

The above expression defines \(O_P\) on a basis of the \(n+1\) qubit pure states and \(O_P\) is therefore defined for all states by linearity.

Amplitude Amplification. One of the main tools we will use in our algorithms is the quantum amplitude amplification routine.

Theorem 1

([19], Quantum amplitude amplification). Let P a projector acting on n qubits and \(O_P\) a projection oracle for P. Let \(\mathcal {A}\) be a quantum unitary that produces a state \(\left| \phi \right\rangle = \alpha \left| \phi _P \right\rangle + \beta \left| \phi _P^\bot \right\rangle \) where \(\left| \phi _P \right\rangle \in Im(P)\) and \(\left| \phi _P^\bot \right\rangle \in Ker(P)\). Notice that \(tr(P \left| \phi \rangle \langle \phi \right| ) = |\alpha |^2\). We note \(|\alpha | = \sin (\theta )\) for some \(\theta \in [0, \pi /2]\). There exists a quantum algorithm that:

  • Consists of exclusively \(N = \left\lfloor \frac{\pi }{4\theta } - \frac{1}{2} \right\rfloor \) calls to \(O_P,O_P^\dagger ,\mathcal {A},\mathcal {A}^\dagger \) and a final measurement.

  • Produces a quantum state close to \(\left| \phi _P \right\rangle \).

The algorithm \(\mathcal {A}\) is called the setup and the projection P the projector of the quantum amplification algorithm. This whole procedure will be denoted

$$\mathtt {QAA}(\mathtt {setup}, \mathtt {proj}) = \mathtt {QAA}(\mathcal {A},P)$$

and its running time is

$$ N \left( |\mathcal {A}|_{RT} + |O_P|_{RT}\right) .$$

where the notation \(|\cdot |_{RT}\) represents the running time of the respective algorithms. If both \(\mathcal {A}\) and \(O_P\) can be done in time polynomial in n, the above is \(\widetilde{O}\left( N\right) \).

This projection P can be sometimes characterized by a test function f, such that \(\left| x \right\rangle \in Im(P)\) when \(f(x) = 1\) and \(\left| x \right\rangle \in Ker(P)\) when \(f(x) = 0\). Amplitude amplification can be seen as a generalization of Grover’s algorithm. Let us briefly show how to retrieve it.

Grover’s Algorithm. We are given an efficiently computable function \(f : \{0,1\}^n \mapsto \{0,1\}\) and we want to find an element x such that \(f(x) = 1\). We take P such that \(\left| x \right\rangle \in Im(P)\) when \(f(x) = 1\) and \(\left| x \right\rangle \in Ker(P)\) when \(f(x) = 0\). \(O_P\) can be constructed with a single call to \(O_f\). We use as setup the algorithm \(\mathcal {A}\) that produces the state \(\left| \phi \right\rangle = {\frac{1}{2^{n/2}}} \sum _{x \in \{0,1\}^n} \left| x \right\rangle \). In order to produce \(\left| \phi \right\rangle \), we perform a Hadamard operation on each qubit, which is very efficient.

We write \(\left| \phi \right\rangle = \frac{1}{2^{n/2}} \sum _{x: f(x) = 1} \left| x \right\rangle + \frac{1}{2^{n/2}} \sum _{x: f(x) = 0} \left| x \right\rangle \). We have \(tr(P \left| \phi \rangle \langle \phi \right| ) = \frac{|\{x : f(x) = 1\}|}{2^n}\). Using the above quantum amplitude amplification procedure \(\mathtt {QAA}(\mathcal {A},P)\), and by measuring the obtained state, we can find with high probability an element x such that \(f(x) = 1\) in time \(\widetilde{O} \left( \sqrt{\frac{2^n}{|\{x : f(x) = 1\}|}} \right) \).

For most applications, e.g. quantum exhaustive key search, there is only one “marked” element x such that \(f(x) = 1\) (e.g., the key). Then Grover search attains a complexity \(\widetilde{O} \left( \sqrt{2^n} \right) \).

Quantum Query, Memory and Time Complexity. Most of the complexity lower bounds on quantum algorithms in the literature, as well as the algorithms that meet these bounds, are based on query complexity. As such, they count the number of oracle queries \(O_f\) used, where \(O_f\) is a quantum oracle for a function f or more generally the data that is being accessed.

Notice that classical queries are a particular case of superposition queries, so we consider them alike in what follows.

However, query complexity can be completely different from time complexity: the latter represents the number of elementary quantum gates (unitaries) successively applied to a qubit or a qubit register. It has the same flavor as classical time complexity, since it counts elementary operations applied sequentially.

We emphasize that memory complexity has a different meaning in the quantum and the classical setting. While classical memory is thought of as a database with fast access, quantum memory denotes the number of qubits in the circuit. Having more qubits means that more operations can be applied in parallel, hence decreases the time complexity: it rather corresponds to classical parallelization.

3 State-of-the-Art on Collision and Multi-target Preimage Search

The two problems that we consider in a quantum setting, collision search and multi-target preimage search, are described in this section. We also briefly describe the best known classical algorithms for solving them and their complexities, as well as the previously best known quantum algorithms, that we will improve in Sect. 5. We will provide a discussion on the comparison of both previous algorithms. In the end of this section we additionally provide some examples of common applications of this problems on cryptanalysis.

3.1 Studied Problems

In this work we consider the two following problems:

Problem 1

(Collision finding). Given access to a random function \(H~: \{0,1\}^n \rightarrow \{0,1\}^n \), find \(x,y \in \{0,1\}^n\) with \(x \ne y\) such that \(H(x) = H(y)\).

We consider here a random function which models best the cryptographic functions (encryption functions or hash functions) that we want to study.

Problem 2

(Multi-target preimage search). Given access to a random permutation \(H~: \{0,1\}^n \rightarrow \{0,1\}^n \) and a set \(T = \{ y_1, \ldots , y_{2^t} \}\), find the preimage of one of the \(y_i\) by E i.e. find \(i \in \{1,\ldots ,\ell \}\) and \(x \in \{0,1\}^n\) such that \(E(x) = y_i\).

The above problem can also be considered when replacing H with a random function.

Previous quantum algorithms in the literature ([19]) for Problems 1 and 2 consider sometimes the case of r-to-1 functions. Although we restrict ourselves to the case of random functions and permutations, which is relevant in cryptographic applications, we remark that the algorithms presented below could be rewritten for r-to-1 functions.

3.2 Classical Algorithms to Solve Them

Collision search. The birthday paradox states that if we draw at random \(2^{n/2}\) elements \(x_i\in \{0,1\}^n\) we will find a collision between two of their images, i.e. \(H(x_i)=H(x_j)\), with good probability (i.e. 0.5), and a collision can be found with \(O(2^{n/2})\) time and memory. Pollard’s rho algorithm [48] allows to reduce the memory complexity to a negligible amount while keeping the same time complexity. No classical algorithm with a single processor exists for finding collisions on a set of \(2^{n}\) elements with a lower time complexity than \(O(2^{n/2})\).

Parallelizing collision search. In [47] a method for reducing the time complexity efficiently through parallelization is proposed. The total amount of computations is slightly increased, and the time-space product is not smaller than \(O(2^{n/2})\), but the speed up will be linear. The method is based in considering a common list were all found distinguished points will be stored, until a collision on them is found. The time complexity becomes \(O({2^{n/2}/m}+2.5 \theta )\), for a case where all collisions are useful and must be located when considering m processors, and \(\theta \) is the proportion of distinguished points, that will have a direct impact in the memory needs.

Multi-target preimage attacks. With respect to this second problem, the best classical algorithm finds one out of \(\ell =2^{t}\) targets with an exhaustive search in \(\varOmega (2^{(n\,-\,t)})\) (see for instance [6]). This complexity can be trivially derived by the fact that the probability of finding one out of the \(2^t\) preimages is \(\frac{2^t}{2^n}\).

3.3 Previous Quantum Algorithms

Quantum Algorithms for the Collision Problem. The quantum collision search problem was first studied by Brassard, Høyer and Tapp ([20]). Using Grover’s algorithm as a subroutine, they showed that the collision problem for a 2-to-1 function f could be solved using \(\widetilde{O}(2^{n/3})\) queries to \(O_f\) and \(\widetilde{O}(2^{n/3})\) quantum memory. After, there has been many results on query lower bounds for the collision problem, ([1, 2, 40]), until a bound \(\varOmega (2^{n/3})\) was reached. Zhandry also extended the collision problem to random functions, which is relevant in a cryptographic setting ([53]), and proved that this bound still held.

Another related and well studied problem is element distinctness, where the question is to decide whether the outputs of the function f are all distinct (or, equivalently, to find a collision if there is at most one). In particular, Ambainis ([3]) presented a quantum walk algorithm for this problem and showed a time complexity of \(\widetilde{O}(2^{2n/3})\), using \(\widetilde{O}(2^{2n/3})\) quantum bits of memory. In [1] \(\varOmega (2^{2n/3})\) was shown to be a query lower bound for this problem, so those results are essentially optimal. It is known that element distinctness can be reduced to collision by gaining a root in the time complexity, which gives an essentially optimal quantum time and memory of \(\widetilde{O}(2^{n/3})\).

Here, we show the original algorithm for collision search from ([20]), that uses Grover. This algorithm has query complexity \(\widetilde{O}(2^{n/3})\) but running time \(\widetilde{O}(2^{2n/3})\). It is also possible to reduce the running time of the algorithm below to \(\widetilde{O}(2^{n/3})\) by using \(\widetilde{O}(2^{n/3})\) quantum processors in parallel.

figure a

Limits of existing work. The practical downside of the currently available algorithms for collision is that, although they might require as little as \(\widetilde{O}(2^{n/3})\) time, they would need \(\widetilde{O}(2^{n/3})\) quantum memory, as in ([3]) or even sometimes \(\widetilde{O}(2^{n/3})\) quantum processors as in ([20]), see also ([33]). Contrarily to the classical memory, which is cheap, quantum memory is a very costly part in quantum computers. It was argued by Grover and Rudolph ([33]) that a large amount of quantum memory is almost equivalent to a large amount of quantum processors. Even if one disagrees with this statement, it is widely believed that if any implementations of such algorithms will ever exist, they cannot use a large amount of quantum memory. A general discussion on the impracticality of known quantum algorithms for collision was also made by Bernstein in [9].

In summary, even if the collision problem can be solved in quantum time \(\widetilde{O}(2^{n/3})\), the current algorithms require the same amount of quantum memory: the quantum time-memory product of such algorithms is \(O(2^{2n / 3})\), and are arguably considered impractical, even with a functioning quantum computer. The goal of our work is to present a quantum algorithm for those problems with a small number of qubits required, which will clarify the real advantage of a quantum adversary.

Quantum Algorithms for Multi-target Preimage Search. The multi-target preimage search has been much less studied than quantum collisions. As said before, in the classical setting, the best known algorithm requires time \(\varOmega (2^{n\,-\,t})\). In the quantum setting, we present here a basic algorithm, that uses Grover search, inspired by [20]. Independently of our work, Banegas and Bernstein presented at SAC 2017 a method to perform quantum parallel multi-target preimage search ([7]). It has however little to do with the techniques studied in this paper.

figure b

Algorithm 2 has a query complexity of \(\widetilde{O}(2^{\frac{n\,-\,t}{2}})\). However, the actual time complexity can be much larger. Given a classical description of the set T, the membership oracle \(O_T\) costs either \(\widetilde{O}(2^t)\) quantum memory, either \(\widetilde{O}(2^t)\) quantum time. In any case, the time-memory product of this algorithm is at least \(\widetilde{O}(2^t2^{\frac{n\,-\, t}{2}}) = \widetilde{O}(2^{\frac{n \,+\, t}{2}})\). Surprisingly (and quite annoyingly), the best tradeoff would be obtained for \(t=0\), i.e. one preimage only.

Comparison of Our Work and Existing Work Using Different Benchmarks. The comparisons between quantum and classical time-memory products are summarized in Tables 1 and 2. Let us now consider different benchmarking scenarios and compare our work with existing work for the collision problem. When considering multiple processors in parallel, we will use the variable s, such that we have access to \(2^s\) processors in parallel.

  • If quantum memory is expensive: our quantum algorithms are the only ones that beat classical algorithms with only O(n) quantum bits, with a single quantum processor. Our algorithms also beat existing quantum algorithms if we compare in terms of quantum time-space product.

  • If quantum memory becomes as cheap as classical memory, but parallelization is hard then Ambainis’ algorithm will have better performances than ours.

  • When comparing to classical algorithms, how should we treat classical vs. quantum memory? If we consider just a time-space product (including classical space) then our single processor algorithm has a time-space product of \(\widetilde{O}(2^{3n/5})\). However, if this is the quantity of interest then we can take \(s = n/5\) in our quantum parallel algorithm and we will obtain a time-space product of \(\widetilde{O}(2^{12n/25}) < \widetilde{O}(2^{n/2})\) which again beats the best classical algorithms with this benchmarking. If we consider that classical memory is very cheap then our algorithms compare even better to the classical ones (if we still reasonably consider the parallelization cost).

Table 1. Algorithms for collision search. The last line is valid for \(s \le n/4\).
Table 2. Algorithms for multi-target preimage search. We consider \(2^s\) processors for the two parallelized algorithms and a single one for the rest.

3.4 Cryptographic Applications of the Problems

Searching for collisions and (multi-target) preimages are recurrent generic problems in symmetric cryptanalysis. We describe here several scenarios whose security would be considerately affected by an improvement in the resolution of these problems by quantum adversaries. The improvements permitted by our algorithms will be detailed in Sect. 6.

Hash Functions. A hash function is a function H that, given a message M of an arbitrary length, returns a value \(H(M)=h\) of a fixed length n. They have many applications in computer security, as in message authentication codes, digital signatures and user authentication. Hash functions must be easy to compute. An “ideal" hash function verifies the following properties:

  • Collision resistance: Finding two messages M and \(M'\ne M\) such that \(H(M) = H(M')\) should cost \(\varOmega (2^{n/2})\) by the birthday paradox [52].

  • Second preimage resistance: Given a message M and its hash H(M), finding another message \(M'\) such that \(H(M) = H(M')\) should cost \(\varTheta (2^{n})\) by exhaustive search.Footnote 2

  • Preimage resistance: From a hash h, finding a message M so that \(H(M) = h\) should cost \(\varTheta (2^{n})\) by exhaustive search.

We can see how, if the algorithms for solving collision search or preimages are improved, the offered security of hash functions would be reduced.

Multi-user Setting. In what follows, \(E_K\) will always denote a symmetric cipher under key K of length k, of block size n. We consider \(E_K\) as a random permutation of bit-strings \(E_K~: \{0,1 \}^n \rightarrow \{0,1\}^n\). We consider the setting where an adversary tries to recover the keys of many users of \(E_K\) in parallel. One of the considered scenarios [13, 14, 22, 45] tries to recover one key out of the \(2^t\) more efficiently than in the single key setting. It is easy to see how this can be associated to the multi-target preimage problem: we can for instance consider that all the \(2^t\) users are encrypting the same message, each with a different key, and we recover the corresponding encrypted blocks. This setting seems realistic: it could be the case of users using the CTR operation mode [25], which is one of the two most popular and recommended modes (see for instance [43]), in protocols like for instance TLS [24]. The users consider \(IV=0\) and different secret keys. Recovering one key out of the \(2^{t}\) would cost in a generic and classical way \(2^{k\,-\,t}\) encryptions, for a key of length k. Similar scenarios have been studied in [28] with respect to the Even-Mansour cipher [27] and the Prince block cipher [17].

Collision Attacks on Operation Modes. Using operation modes such as CBC or CTR, block ciphers are secure up to \(2^{n/2}\) encryptions with the same key [44], where collisions start to occur and reveal information about the plaintexts (see Sect. 2.1). The recommendation from the community is to limit the number of blocks encrypted with the same key to \(\ell \ll 2^{n/2}\), but this is not always respected by standards or actual applications. Such an attack scenario is not merely theoretical, as the authors of [11] pointed out.

They proved that when the birthday bound was only weakly enforced, collision attacks were practical against 64-bit block ciphers when using CBC. In their scenario, the attacker lures the user into sending a great number of HTTP requests to a target website, then captures the server’s replies: blocks of sensitive data encrypted under the same key. This attack has time and data complexity \(O(2^{n/2})\) (practical when \(n = 64\)).

Bricks for Cryptanalysis Techniques. Both collision search and multi-target preimage search are often bricks used in some evolved cryptanalysis techniques, as for instance in truncated differential attacks [38] or in impossible differential attacks [12, 37] where the adversary needs to find partial output collisions to perform the attacks. Consequently, any acceleration of the algorithms solving these problems would be directly translated in an acceleration of one of the terms of the complexity, and potentially, on an improvement of the complexity of the cryptanalysis technique.

4 The Membership Oracle

In the algorithm of Brassard et al., as well as in the algorithm that will be detailed in Sect. 5, a quantum oracle is needed for computing membership in a large, unstructured set. We formalize here this essential building block.

Definition 1

Given a set T of \(2^t\) n-bit strings, a classical membership oracle is a function \(f_T\) that computes: \(f_T(x) = 1\) if \(x \in T\) and 0 otherwise.

A quantum membership oracle for T is an operator \(O_T\) that computes \(f_T\):

$$ O_T(\left| x \right\rangle \left| b \right\rangle ) = \left| x \right\rangle \left| b \oplus f_T(x) \right\rangle . $$

The model of computation and memory. The set \(T = \{x_1,\dots ,x_{2^t}\}\) for which we want to construct a quantum membership oracle is stored in some classical memory, and we require only classical access to it, meaning that for any \(i \in [1,\dots ,2^t]\), we can efficiently obtain element \(x_i\). Notice that all \(x_i\) are distinct; this is ensured e.g. by the data structure itself or by a preliminary in-place sort in \(\widetilde{O}(2^t)\). We use the following quantum operations:

  • A quantum creation algorithm that takes a classical input x of n bits, and n qubits initialized at \(\left| 0 \right\rangle \) and outputs \(\left| x \right\rangle \) in this register. This can be done in time n by constructing each qubit of \(\left| x \right\rangle \) separately.

  • A quantum unitary COMP defined as follows:

    $$ \forall x,y \in \{0,1\}^n, \forall b \in \{0,1\}, \ COMP(\left| x \right\rangle \left| y \right\rangle \left| b \right\rangle ) := \left| x \right\rangle \left| y \right\rangle \left| b \oplus (\delta _{xy}) \right\rangle .$$
  • A quantum deletion algorithm that takes a classical input x and \(\left| x \right\rangle \) and outputs \(\left| 0 \right\rangle \). This is just done by inverting the creation algorithm.

Using those operations, we describe now how to construct \(O_T\). We start from an input \(\left| x \right\rangle \left| b \right\rangle \) and want to construct \(\left| x \right\rangle \left| b \oplus f_T(x) \right\rangle \). Our construction will be clearly linear and will correspond to a quantum unitary. The idea is simple: for each \(x_i \in T\), we will check whether \(x = x_i\) and update the second register accordingly.

figure c

Proposition 1

The above procedure implements \(O_T\) perfectly, in time \(n2^t\) using \(2n\,+\,1\) bits of quantum memory.

Proof

The proof is by a straightforward induction. It is easy to see that \(\left| \phi _{i\,+\,1} \right\rangle \) is the state:

$$\begin{aligned} \left| x \right\rangle \left| b \oplus (\delta _{x x_1}) \oplus (\delta _{x x_2}) \ldots \oplus (\delta _{x x_{i}}) \right\rangle . \end{aligned}$$

By definition:

$$\begin{aligned} f_T(x) = 1 \iff x \in T \iff (x = x_1 \vee \ldots \vee x = x_{2^t}) \end{aligned}$$

which implies (all \(x_i\) are distinct):

$$\begin{aligned} \delta _{x x_1} \oplus \delta _{x x_2} \ldots \oplus \delta _{x x_{i}} = (x = x_1 \vee \ldots \vee x = x_{i}) . \end{aligned}$$

The result follows:

$$\begin{aligned} \left| \phi _{2^t\,+\,1} \right\rangle = \left| x \right\rangle \left| b \oplus \delta _{x x_1} \oplus \delta _{x x_2} \ldots \oplus \delta _{x x_{2^t}} \right\rangle = \left| x \right\rangle \left| b \oplus f_T(x) \right\rangle . \end{aligned}$$

5 Description of Our Quantum Algorithms

In this section we describe our new algorithms for collision and multi-target preimage search. They use three (resp. two) instances of the amplitude amplification procedure (see Theorem 1 in Sect. 2).

5.1 Quantum Algorithm for Collision Finding

Our algorithm, described in Algorithm 4, relies on a balance between the cost of queries to the function and queries to the membership oracle. This balance principle was in fact already considered in [32] to improve the running time of Grover’s algorithm. In the algorithm of Brassard et al., when using only logarithmic quantum memory, each query costs \(O(2^{n / 3})\) time, so there is much room for improvement.

figure d

The way we construct the input space \(S^H_r\) and the membership oracle \(f^H_{L}\) allow us to decrease the projecting time while increasing the setup time. Independently from the choice of t and r, the quantum memory complexity remains O(n).

Analysis of the Algorithm. In this section, we make some simplifying assumptions. These assumptions are the following:

  • The QAA used in our setting outputs exactly the desired state.

  • \(|S^H_r| \approx 2^{n\,-\,r}\).

  • Let us define \(Sol_{f} := \{x : f^H_L(x) = 1\}\). We have \(|Sol_f| \approx 2 \times 2^{t\,-\,r}\). Indeed, each element of L can be mapped with its first coordinate to an element of \(Sol_f\) which corresponds to \(2^{t\,-\,r}\) elements. Each x such that \((x,H(x)) \notin L\) is in \(Sol_f\) with probability \(2^{-n\,+\,(t\,-\,r)}\). Since there are \(2^n \,-\, 2^{t\,-\,r}\) such elements, this corresponds to approximately \(2^{t\,-\,r} - 2^{2(t\,-\,r) - n} \approx 2^{t\,-\,r}\) elements.

  • We omit all factors polynomial in n and consider that the running time of \(O_H\) is 1.

With those assumptions, we get a running time of \(2^{\frac{2n}{5}}\) as we show below. If we remove all the above assumptions, we will still obtain a running time of \(2^{\frac{2n}{5}}\left( |O_H|_{RT} + O(n)\right) \).

Probability of success. We constructed a list L of \(2^{t\,-\,r}\) elements of the form (xH(x)). The algorithm outputs a random \(x \in Sol_f\). Our protocol succeeds if that element is not in L. Since \(|L| = 2^{t\,-\,r}\) and \(|Sol_f| \approx 2 \times 2^{t\,-\,r}\), we get a good outcome with probability \(\frac{1}{2}\).

Time analysis. Recall that an amplification procedure QAA uses two algorithms: a projection oracle \(O_P\) as well as a setup setup that produces a state \(\left| \phi \right\rangle = \alpha \left| \phi _{P} \right\rangle + \beta \left| \phi _{P}^\bot \right\rangle \).

We decompose our algorithm into four subroutines.

  1. 1.

    Constructing the list L: an element of L can be constructed in time \(2^{r/2}\) by applying Grover’s search algorithm on the function \(f(x) := 1 \text { if } x \in S^H_r\) and \(f(x) := 0\) otherwise. Since the whole list L contains \(2^{t\,-\,r}\) elements, it can be constructed in time \(2^{t - \frac{r}{2}}\).

  2. 2.

    Constructing \(\left| \phi _r \right\rangle \): we use an algorithm \(\mathcal {A} = \mathtt {QAA}(\mathtt {setup}_\mathcal {A}, \mathtt {proj}_\mathcal {A})\), where \(\mathtt {setup}_\mathcal {A}\) builds the superposition \(\left| \phi _0 \right\rangle = \frac{1}{2^{n/2}} \sum _{x \in \{0,1\}^n} \left| x \right\rangle \) using a query to \(O_H\) and \(\mathtt {proj}_\mathcal {A} = \sum _{x \in S^H_r} \left| x \rangle \langle x \right| .\) \( tr(P \left| \phi _0 \rangle \langle \phi _0 \right| ) = 2^{\,-\,r}\) so we have to perform \(2^{r/2}\) iterations, i.e. make \(2^{r/2}\) calls to \(\mathtt {setup}_\mathcal {A}\) and \(\mathtt {proj}_\mathcal {A}\). Algorithm \(\mathcal {A}\) takes therefore time \(2^{r/2}\).

  3. 3.

    Constructing \(O_{f^H_{L}}\). The details of this construction appear in Sect. 4. In particular, we saw that \(O_{f^H_{L}}\) runs in time \(2^{t\,-\,r}\) by testing sequentially against the elements of L (recall we dismissed the factor n for simplicity).

  4. 4.

    Performing the main amplitude amplification: Algorithm \(\mathcal {B} = \mathtt {QAA}(\mathtt {setup}_\mathcal {B} = \mathcal {A}, \mathtt {proj}_\mathcal {B})\), where \(\mathcal {A}\) is the setup routine that constructs state \(\left| \phi _r \right\rangle \), and \(\mathtt {proj}_\mathcal {B} = \sum _{x:f^H_L(x)=1} \left| x \rangle \langle x \right| \). \(O_{\mathtt {proj}_\mathcal {B}}\) can be done with 2 calls to \(O_{f^H_{L}}\).

The probability that a random \(x \in S^H_r\) satisfies \(f_L^H(x) = 1\) is \(\frac{|Sol_f|}{|S_r|} = \frac{2 * 2^{t\,-\,r}}{2^{n\,-\,r}}\) so \(tr(\mathtt {proj}_\mathcal {B} \left| \phi _r \rangle \langle \phi _r \right| ) = \frac{2 * 2^{t\,-\,r}}{2^{n\,-\,r}} = 2^{\,-\,n \,+\,t \,+\, 1}\) and Algorithm \(\mathcal {B}\) makes \(2^{\frac{n \,-\, t \,-\, 1}{2}}\) calls to \(\mathcal {A}\) and \(O_{f^H_{L}}\). As a result, algorithm \(\mathcal {B}\) runs in time:

$$\begin{aligned} 2^{\frac{n\,-\,t\,-\,1}{2}} \left( \mathtt {setup}_\mathcal {B} + \mathtt {proj}_\mathcal {B} \right) = 2^{\frac{n\,-\,t\,-\,1}{2}} \left( {2^{r/2}} + 2^{t\,-\,r} \right) . \end{aligned}$$

The running time of the procedure is the time to create the list L plus the time to run algorithm \(\mathcal {B}\), which is

$$ 2^{\frac{n\,-\,t\,-\,1}{2}} \left( {2^{r/2}} + 2^{t\,-\,r} \right) + 2^{t \,-\, \frac{r}{2}} .$$

A quick optimization of the above expression imposes \(t = \frac{3n}{5}\) and \(r = \frac{2t}{3} =\frac{2n}{5}\). This realizes a balance in \(\mathcal {B}\) between the cost of the setup and the cost of a projection, and between the cost of \(\mathcal {B}\) and the cost of computing L.

This gives a total running time of \(2^{\frac{2n}{5}}\), up to a small multiplicative factor in n.

Memory analysis. The quantum amplitude amplification algorithms and the circuit \(O_{f_L^H}\) only require quantum circuits of size O(n): the quantum memory (number of qubits) needed is low. As for the classical memory required, the only data we need to store is the list L that contains \(2^{t\,-\,r} = 2^{\frac{n}{5}}\) elements.

Theorem 2

Let \(H~: \{0,1\}^n \rightarrow \{0,1\}^n\) be a random function computable efficiently. There exists a quantum algorithm running in time \(\widetilde{O} \left( 2^{\frac{2n}{5}} \right) \), using \(\widetilde{O} \left( 2^{\frac{n}{5}} \right) \) classical memory and O(n) quantum space, that outputs a collision of H.

5.2 Quantum Algorithm for Multi-target Preimage Search

Here, we are given a function H and a list \(L' = \{y_1, \ldots , y_{2^t} \}\) of elements of size \(2^t\). The goal is to find x such that \(\exists y_i, H(x) = y_i\), the preimage of one of them. The algorithm used is very similar to Algorithm 4.

figure e

The only difference with respect to the previous algorithm is that the list \(L'\) of targets has to be read, even in an online manner, to create the sublist L. This operation will take time \(2^t\).

Because the rest of the algorithm remains unchanged, the total running time is:

$$ 2^{\frac{n\,-\,t}{2}} \left( 2^{\frac{r}{2}} + 2^{t\,-\,r} \right) + 2^t $$

which is minimized for \(r = \frac{2t}{3}\) and \(t = \frac{3n}{7}\). We distinguish 2 cases:

  • if \(t \le \frac{3n}{7}\), we take \(r = \frac{2t}{3}\) and the above running time becomes

    $$ 2^{n/2 - t/6} + 2^t \le 2^{n/2 \,-\, t/6 \,+\, 1}. $$
  • if \(t \ge \frac{3n}{7}\), we truncate the list \(L'\) beforehand so that it has \(2^{3n/7}\) elements and we apply our algorithm on this list. The running time will therefore be \(2^{3n/7}\).

Memory analysis. The only data we need to store is the list L that contains \(2^{t\,-\,r} = 2^{\frac{t}{3}}\) elements. The reason why we do not have to store all elements of \(L'\) is that we can discard all elements of \(L'\) that are not in \(S^H_r\) as soon as we receive them and locally (\(L'\) is analyzed in an online way). The quantum algorithm is still a circuit of size O(n), without external quantum memory.

Theorem 3

Let \(H~: \{0,1\}^n \rightarrow \{0,1\}^n\) be a random permutation. Given a list of \(2^{t}\) elements, with \(t \le \frac{3n}{7}\), there exists a quantum algorithm running in time \(\widetilde{O} \left( 2^{n/2 \,-\, t/6} \right) \), using \(\widetilde{O} \left( 2^{\frac{t}{3}}\right) \) classical memory and O(n) quantum space, that finds the preimage of one of them.

Theorem 4

Let \(H~: \{0,1\}^n \rightarrow \{0,1\}^n\) be a random permutation. Given a list of \(2^{\frac{3n}{7}}\) elements, there exists a quantum algorithm running in time \(\widetilde{O} \left( 2^{\frac{3n}{7}} \right) \), using \(\widetilde{O} \left( 2^{\frac{n}{7}}\right) \) classical memory and O(n) quantum space, that finds the preimage of one of them.

A similar analysis can be done with only marginal differences if we replace the random permutation by a random function.

5.3 Parallelization and Time-Space Tradeoff

Assume that the adversary has now \(2^s\) registers of n qubits available. A simple way to trade space (more qubits) for time is to run in parallel multiple instances of the algorithm. We call this process outer parallelization, and emphasize that quantum memory corresponds to the number of quantum processors working in parallel.

List computation. In the case of collision search, computing the list L now costs only \(2^{t\,-\, r/2\,-\,s}\) time. Notice, however, that the number of queries to \(O_H\) remains \(2^{t\,-\,r / 2}\).

Outer parallelization. Our algorithm consists of iterations of an operator that amplifies the amplitude of the good states (recall that \(2^{\frac{n\,-\,t}{2}}\) such iterations are performed). So, instead of running only one instance and getting a good result with probability close to 1, we can run multiple instances in parallel with less iterations for each. The number of queries made to \(O_H\) will be the same.

By running \(O(2^s)\) instances, to ensure success probability O(1), we need each of them to have success probability \(O(2^{-s})\). So instead of running \(2^{\frac{n\,-\,t}{2}}\) iterations of the outer amplification procedure, only \(2^{\frac{n\,-\,t\,-\,s}{2}}\) suffice. The running time for collision becomes

$$2^{\frac{n\,-\,t\,-\,s}{2}}\left( 2^{r/2} + 2^{t\,-\,r}\right) + 2^{t\,-\,r/2\,-\,s} .$$

In collision search, this is \(t = \frac{3n}{5} + \frac{3s}{5}\) which gives \(r = \frac{2n}{5} + \frac{2s}{5}\), a classical memory \(t-r = \frac{n}{5} + \frac{s}{5}\) and a time complexity exponent \(t - \frac{r}{2} - s = \frac{2n}{5} - \frac{3s}{5}\).

In order for those parameters to be valid for collision, we need \(n - t - s \ge 0\) with \(t = \frac{3n}{5} + \frac{3s}{5}\) which gives \(s \le \frac{n}{4}\).

For multi-target preimage, the running time becomes

$$2^{\frac{n\,-\,t\,-\,s}{2}}\left( 2^{r/2} + 2^{t\,-\,r}\right) + 2^{t\,-\,s} .$$

The optimal value of r is still \(r = \frac{2}{3}t\). In multi-target preimage search, the optimal value of t is achieved for \(\frac{n}{2} - \frac{t}{6} - \frac{s}{2} = t -s\) or equivalently \(t = \frac{3n}{7} + \frac{3s}{7}\). The running time becomes \(2^{3n/7 - 4s/7}\) and the used classical memory is \(2^{\frac{n \,+\, s}{7}}\).

Theorem 5

(Outer parallelization). Let \(H~: \{0,1\}^n \rightarrow \{0,1\}^n\) be a random permutation. Given a list of \(2^{t}\) elements, with \(t \le \frac{3n \,+\, 3s}{7}\), there exists a quantum algorithm with \(2^s\) quantum processors running in time \(\widetilde{O} \left( 2^{n/2 -\, \,t/6 \,-\, s/2} \right) \), using \(\widetilde{O} \left( 2^{\frac{t}{3}}\right) \) classical memory, that finds the preimage of one of them.

Theorem 6

(Outer parallelization). Let \(H~: \{0,1\}^n \rightarrow \{0,1\}^n\) be a random permutation. Given a list of \(2^{t}\) elements, with \(t = \frac{3n \,+\, 3s}{7}\), there exists a quantum algorithm with \(2^s\) quantum processors running in time \(\widetilde{O} \left( 2^{3n/7 \,-\, 4s/7} \right) \), using \(\widetilde{O} \left( 2^{\frac{n\,+\,s}{7}}\right) \) classical memory, that finds the preimage of one of them.

Fig. 1.
figure 1

Quantum time-space product for outer parallelization

As shown on Fig. 1 , there is a range of values of s where the time-space product is effectively smaller than in the classical setting (where all current algorithms obtain an exponent \(\frac{n}{2}\)). The limit value is \(s = \frac{n}{6}\) for preimage search and \(s = \frac{n}{4}\) for collisions.

Inner parallelization. It is also possible to parallelize computations in the algorithm itself, especially its most costly building block: the membership oracle \(O_{f^H_L}\). We studied this and concluded that this way of parallelizing is not as efficient as outer parallelization.

5.4 Accurate Computations and Parameters

In what precedes, we didn’t take into account four sources of possible differences between theory and practice. First, hidden constants: we dismiss the \(\pi / 4\) factor that stems from amplitude amplification. Second, the logarithmic factor n that appears in the membership oracle. Third, the errors that propagate in the amplitude amplification procedure. Fourth, the cost of a query to the oracle \(O_H\). This last one is actually the most relevant parameter.

Let \(2^c\) be the time complexity of a query. We adapt the parameters r and t as follows:

  • In any case, \(r = \frac{2}{3} (t - c + \ln _2(n))\) balances the costs;

  • In multi-target preimage search, \(t = \frac{3n}{7} + \frac{4c}{7} + \frac{2\ln _2(n)}{7}\) is the new complexity exponent. Notice that our method also amortizes the cost of \(O_H\) w.r.t a simple Grover search.

  • In collision search, \(t = \frac{3n}{5} - \frac{4c}{5} + \frac{4 \ln _2(n)}{5}\) and time complexity exponent is \(\frac{2n}{5} + \frac{4c}{5} + \frac{\ln _2(n)}{5}\).

These computations mean that there is no surprise: the n factor missing above does no more than multiplying the time complexity by 4 (\(n = 128\)) or 16 (\(n = 256\)), and by taking into account the cost of a query \(2^c\), the time complexity does not exceed the previous one multiplied by \(2^c\). It even behaves better.

In Tables 4 and 3, we give some complexity results without taking into account the n and \(2^c\) factors. We do not take into account ancilla qubits, i.e. additional qubits used during the computation. Detailed studies on the quantum cost of implementing Grover’s algorithm have been made, e.g. in [30] for an AES exhaustive key search and [4] for preimage search on SHA-2 and SHA-3 using Grover. Due to space constraints, we cannot go into the technicalities of quantum implementations and restrict ourselves to high-level comparisons; notice, however, that the two aforementioned articles could help in deriving precise hardware costs for our algorithms.

Table 3. Quantum collision attack – rounded exponents
Table 4. Quantum multi-target preimage attack – rounded exponents

Errors that Propagate in the Amplitude Amplification Procedure. We perform many instances of QAA in our algorithm so it is important to understand how the errors propagate and see if it doesn’t create a large cost in the algorithm. We want to briefly describe here the behavior of those errors in our algorithm; more detailed computations are available in the long version of this paper [21]. Let us consider our first QAA algorithm to construct \(\left| \phi _r \right\rangle \). There are 2 factors that can induce errors here: (1) the fact that we do not know exactly \(|S_r^H|\) and (2) the fact that even with perfect knowledge of the angle used in QAA, the algorithm will construct a state close to \(\left| \phi _r \right\rangle \) but can’t hit it exactly. The second problem is solved by using a construction from [19]. In order to solve the first problem, we will use the fact that H is random so that the uncertainty will remain very small. To give a rough idea, the first QAA will give a state \(\left| \phi _{output} \right\rangle \) such that

$$ | \langle \phi _{output} | \phi _r \rangle | \ge \cos (2^{r/2 \,-\, n/2} + o(2^{r/2 \,-\, n/2})) .$$

This error will then propagate to the second QAA. We have two possible scenarios:

  • For the collision problem, we have \(r = 2n/5\) and we repeat the second QAA \(2^{n/5}\) times. The error in the angle will increase by this factor so the final error will be \(\approx 2^{n/5} 2^{r/2\,-\,n/2} \ll 1\).

  • For the preimage problem, we have \(r = 2n/7\) and we repeat the second QAA \(2^{2n/7}\) times. The error in the angle will increase by this factor so the final error will be \(\approx 2^{2n/7} 2^{r/2\,-\,n/2} \ll 1\).

This means that the final probability of success will be reduced only marginally.

5.5 Many Collisions

For some purposes, it happens that we do not want to retrieve only one collision pair, but many of them. Suppose \(2^c\) are needed. We modify the parameters in our algorithm to take this into account: now \(t = 3n / 5 + 6c / 5\) and \(r = 2n / 5 + 4c / 5\). Each call returns a collision involving one element of the arbitrary list L of size \(2^{t\,-\,r}\). Hence, we expect \(2^{t\,-\,r}\) such collisions to be found by repeating our algorithm and sharing the list L: this forces \(t-r > c \Rightarrow c < \frac{n}{3}\). Outside this range, c constraints the size of L: we must have \(t-r = c\), \(t = 3c\), computing L now costs \(2^{t\,-\,(t\,-\,c) / 2} = 2^{2c}\) and the list has \(2^c\) elements. The time complexity exponent becomes \(\frac{n}{2} + \frac{c}{2}\); it still presents an advantage over classical collision search.

Theorem 7

(Searching many collisions). Given a random function \(H~: \{0,1\}^n \rightarrow \{0,1\}^n\) on n bits, there exists a quantum algorithm using O(n) qubits and outputting \(2^c\) collisions:

  • If \(c < \frac{n}{3}\), in time \(\widetilde{O} \left( 2^{2n / 5 \,+\, 4c / 5}\right) \), using \(2^{n / 5 + 2c / 5}\) classical memory;

  • If \(c > \frac{n}{3}\), in time \(\widetilde{O} \left( 2^{n / 2 \,+\, c / 2} \right) \), using \(2^c\) classical memory.

To ensure that the collisions found are all distinct, one should also multiply this requirements by a small (logarithmic) factor.

6 Impact on Symmetric Cryptography

We discuss below the applications of our new algorithms on the cryptographic scenarios detailed in Sect. 3.4.

We ask the reader to keep in mind that these results seemed particularly surprising as it was conjectured that quantum algorithms for solving these problems wouldn’t be more efficient than classical ones.

6.1 On Hash Functions

We consider the setting presented in Sect. 3.4: finding collisions and multi-target preimages on hash functions in a post-quantum world can be considerably accelerated by using the new algorithms. It is important to point out that this can be done considering the Q1 setting for the attacker described in Sect. 2.2: that is, just having access to a quantum computer will allow her to accelerate the attack, and she has no need of access to quantum encryption oracles with superposition queries.

To correspond precisely to the description of the problem, we can consider messages with the same length as the hash value.Footnote 3 Indeed, to find a collision, the attacker just has to provide the hash function itself as input for Algorithm 4 (Sect. 5.1). Algorithm 4 will output a collision with a complexity of \(\widetilde{O}(2^{2n/5})\) in time and queries, using \(\widetilde{O}(2^{n/5})\) classical memory and O(n) qubits. This is to compare with the previous best time complexity of \(O(2^{n/2})\).

Finding a preimage out of a set of \(2^t\) generated hash values can be done with Algorithm 5 from Sect. 5.2. It is optimal for \(t=3n/7\) with a cost of \(\widetilde{O}(2^{3n/7})\) in time and queries, using \(\widetilde{O}(2^{n/7})\) classical memory and O(n) qubits. This should be compared to a classical time complexity of \(2^{n\,-\,t}=2^{4n/7},\) or to the previous best quantum attack in \(2^{n / 2}\), ours being the most performant one. Tables 3 and 4 give concrete values when the time-space tradeoff is used.

6.2 On the Multi-user Setting

The scenario that we presented in Sect. 3.4 can also be accelerated by Algorithm 5 of Sect. 5.2. In this case, the attacker recovers a list of ciphertexts generated from the same plaintext, each encrypted under a different key on size k (one key per user).

The goal is to recover one key out of the total \(2^{t}\). In this case, we can consider the attacker scenario Q1: we do not need access to a quantum encryption oracle, but instead implement the function that encrypts a fixed plaintext under the key in argument (as we would do for an exhaustive search with a Grover attack). In this case though, the target ciphertexts must be recovered classically. When the key has the same size as the ciphertext, we can directly apply the multi-target preimage search algorithm, that will be optimal for a value of \(2^t=2^{3k/7}\) users. The best time complexity we can achieve here is \(\widetilde{O}(2^{3k/7})\), compared to the previous best classical \(O(2^{4k/7})\) and the previous best quantum \(\widetilde{O}(2^{k / 2})\).

Bigger Key than the State. If the key is bigger than the ciphertext, i.e. \(k=mn\), we re-construct the problem solved by Algorithm 5 by considering that each user encrypts not one, but m fixed plaintexts.

Less multi-users than optimal. If the number of multiusers is smaller than \(2^{3k/7}\), we will obtain less gain in the complexities, but still considerable with respect to previous attacks. We can consider, to illustrate this, the attack in [28] presented at Asiacrypt 2014 on the Prince block cipher [17]: In this attack, the authors proposed a technique that provided improved complexity regarding already the best known previously classical multi-target preimage attacks, and they were able to recover one key of size 128 bits out of \(2^{32}\) in time \(2^{65}\) (already improved with respect to the naive \(2^{128-32}=2^{96}\) given by the best generic algorithm). If we apply in this case our algorithm we recover a time complexity of

$$ 2^{\frac{k}{2} - \frac{t}{6}} + 2^t = 2^{\frac{128}{2} - \frac{32}{6}} + 2^32 =2^{64\,-\,5.33}=2^{58.67},$$

which improves upon previous results. Our results only need a classical memory of \(2^{18.3}\) and O(n) qubits (compared to a memory need of \(2^{32}\)). Parallelization can also reduce the time complexity in this scenario, but for the sake of simplicity, we won’t go into the details and remit to Sect. 5.3.

6.3 On Operation Modes

As a quantum adversary, we can improve the classical collision attack on CBC introduced in Sect. 3.4 with the help of our algorithm from Sect. 5.2. In this scenario, the attacker has to be placed in the Q2 model from Sect. 2.2: she has access to \(2^t\) classically encrypted blocks, to a quantum computing device and also quantum encryption oracle using the same secret key K.Footnote 4 After recovering the \(2^t\) ciphertexts that form the list \(L'={C_1,\ldots ,C_{2^t}}\), we try to find a preimage x of one of them, i.e., find x such that \(E_K(x)=C_i\) for \(i \in \{1,\ldots ,2^t\}.\) This can be done by directly applying Algorithm 5.

Once we find such an x, we can recover \(P_i\), i.e. the plaintext that generates \(C_i\) through encryption. Due to the CBC construction, we know that \(E_K(P_i\oplus C_{i\,-\,1})=C_i.\) Therefore, and as \(C_{i\,-\,1}\) is known for the attacker, if we recover \(x=P_i\oplus C_{i\,-\,1},\) we also recover \(P_i\). This can be done by a quantum adversary with a cost in time of \(\widetilde{O}(2^{3n/7})\), compared to the classical \(O(2^{n/2})\).

In Sect. 7 we discuss the impact this attack should have on maximal data length enforcement.

Frequent rekeying. If we consider a scenario where the user could be forced to change his key after a few encryptions, this previous attack could be translated in a key-recovery one, in the Q1 model, with a similar procedure as in the multi-user case. We first recover classically \(2^t\) ciphertexts, generated by the encryption of one plaintext with \(2^t\) different keys, and next search for a preimage of one of these multitargets.

6.4 On Bricks for Cryptanalysis Techniques

The last scenario proposed in Sect. 3.4 is less concrete but of great importance. Being very general, the algorithms that we presented here may be used as building blocks by cryptanalysts. With powerful black-box tools and available trade-offs, quantized classical cryptanalysis might become indeed more efficient.

Let us consider as an example the analysis of quantum truncated differential distinguishers from [36]. The aim of the attack is to find a pair of plaintexts with a difference belonging to a certain set \(\varDelta _{in}\), that generate a pair of ciphertexts belonging to another particular set of differences \(\varDelta _{out}\), which is equivalent to colliding in a part of the output state. The attack succeeds if such a pair is found quicker than for a random permutation. The probability of this happening for the attack cipher is denoted by \(2^{-h_T}\).

We consider the case where a single structureFootnote 5 is enough for finding the good pair statistically, i.e. if \(2^{h_T}\le 2^{2|\varDelta _{in}|\,-\,1}\). The authors remark that finding the good pair will cost \(O(2^{h_T/3})\) queries for a quantum adversary. But this would also cost the same amount in space. We could, instead, apply our new algorithm, allowing the quantum space needed to remain polynomial in n with a time complexity still improved over the classic one.

7 Conclusion

7.1 Efficient Algorithms for Collision and Multi-target Preimage Search

We have presented a quantum algorithm for collision and preimage search, based on the amplitude amplification procedure, that is sub-optimal in terms of query complexity but beats the current best algorithm in terms of time complexity with small quantum memory.

To the best of our knowledge, this is the first generic procedure that solves this problem effectively faster than classically, when only linear quantum space is available. Our algorithm can also be parallelized, providing better performance than classical algorithms for a wide range or parameters.

7.2 Impact on Symmetric Primitives

From the applications presented in Sect. 6, we can obtain the following conclusions:

Open Problem on Best Quantum Multi-target Preimages. In Eurocrypt 2015 [10], Sect. 3.2, the authors notice that the best known post-quantum cost for multi-target preimage attacks is also of \(2^{n/2}\), and they provide the following example: for \(n=256\) and \(2^t\,=\,2^{56}\), they claim that the best quantum algorithm has a cost of \(2^{128}\), though it only needs \(2^{100}\) queries. With our algorithms, this implication does not hold anymore: it is possible to attack their example with a time complexity of

$$2^{100}(2^{56/3}+2^{56/3})+2^{56}\approx 2^{119.6}$$

by applying Algorithm 5, which is clearly better than the classical attack, and using a polynomial amount of qubits.

On Maximal Data Length to Enforce. While attacking operation modes via collisions, \(2^t\) data is recovered classically. This \(2^t\) can be significantly smaller than \(2^{n/2}\), and the attack would still have an advantage over the birthday paradox. In fact, when more data is available, the time complexity of the quantum computations decreases up to the limit \(\widetilde{O} \left( 2^{5n / 12} \right) \) (when \(t = n / 2\)).

We can forget about the term \(2^t\), as we are considering \(t< n/2\), and the quantum procedure has complexity \(\widetilde{O}\left( 2^{\frac{n}{2} - \frac{t}{6}}\right) \), which offers a factor \(2^{\,-\,t / 6}\) compared to classical collision search, independent of the block size n. The security requirements will determine the maximal amount of data that can be generated with a given key.

Is doubling the key length enough? The multi-user scenario, as well as the re-keying one make us wonder about the actual security offered by symmetric ciphers in a post-quantum world. By accelerating collision search, we showed that Grover’s exhaustive key search is not the only general threat posed to them: the block size is also a relevant parameter in quantum attacks.

These results increase our impression that many scenarios and settings should be carefully studied before being able to promise certain security levels in a post-quantum world.

7.3 Open Problems and Future Work

Our result fills a gap that existed between the theoretical query complexity of collision search and the actual requirements of an attack. It follows recent non-intuitive results in quantum symmetric cryptanalysis (see e.g. [35]), that have shown the necessity of a better understanding of how symmetric primitives could be affected by quantum adversaries. To our opinion, many such counter-intuitive results are yet to appear.

This work reopens the direction of designing improved quantum algorithms for collisions and preimage finding when the quantum computer does not have access to an exponential amount of quantum memory. The algorithm we propose will not be dismissed as implausible if we want to prove security against quantum attackers: the quantum memory needed is reasonably small. We have been able to propose several significant complexity improvements thanks to this result. Although 2n / 5 is the optimal exponent of our collision algorithm, it introduces additional structure (a prefix of the image is chosen) that is not relevant in many applications: is it possible to get rid of these specificities and bring the exponent down to n / 3?