1 Introduction

Signature schemes are an important element of many cryptographic applications and are one of the schemes standardized by the post-quantum NIST competition [Nis17]. Assessing the exact security (and hence the efficiency) of these schemes is therefore a very important task, both against classical and quantum computers. The GPV construction [GPV08] presents a generic construction of signature schemes in the Hash and Sign paradigm. This construction requires a family \(\mathcal {F}\) of trapdoor preimage sampleable functions (TPSF), which informally is a collection of functions that are hard to invert but which can be easily inverted with some trapdoor. There are two specific properties of their construction:

  1. 1.

    The inversion algorithm that uses the trapdoor should have good repartition properties for each image y. This means that for each image y, the inversion algorithm should output a preimage x according to a certain distribution D.

  2. 2.

    The security of the resulting signature scheme is tightly based on the collision resistance of the family \(\mathcal {F}\) and not on the one-wayness.

These properties were tailored for lattice based schemes where these two properties hold. For example, the lattice-based FALCON signature scheme [FHK+17] is based on the GPV construction. Notice that it is also possible to base the security on one-wayness instead of collision resistance. However, in the generic setting, this can lead to non tight reductions similarly as for Full Domain Hash signatures [BR93, Cor00].

In this paper, we extend the notion of TPSF where the property (1) above should hold only on average for y, defining the notion of Average TPSF (ATPSF). A direct use of the leftover hash lemma shows that we can go from an ATPSF to a TPSF with a quadratic loss in the security of \(\mathcal {F}\)Footnote 1. What we show is the following:

  • We show that this quadratic loss is not necessary and that we can use ATPSF instead of TPSF without any loss.

  • Applying the GPV construction of signature schemes from a family \(\mathcal {F}\) of ATPSF, we show that the security of the signature scheme ie equivalent to solving the Claw with Random Function (Claw(RF)) problem for \(\mathcal {F}\). Informally, in the Claw(RF) problem, we are given a random function \(\mathcal {H}\) and a random f from \(\mathcal {F}\) and we want to find (xy) such that \(f(x) = \mathcal {H}(y)\).

  • We extend this to the quantum setting and show that our tight and optimal results also hold in the QROM.

  • We apply these results to the Wave signature scheme [DST19a] and show more formally its classical and quantum security.

Recently, Chen, Genise and Mukherjee [CGM19] relaxed the GPV construction and use approximate trapdoor functions. They use their results to construct more efficient lattice-based signature schemes. They relax the constraint they put on their preimage. On the other hand, we still require exact preimages but we only require that the inversion algorithm gives a preimage close to the target distribution D only on average on the images. Our results can therefore be seen as another application of relaxing the GPV construction, but for applications beyond lattices.

One of the implications of our results is that:

$$ \text {Collision} \preccurlyeq \begin{array}{c} \text {Claw(RF)} \\ \Updownarrow \\ \text {Signature} \end{array} \preccurlyeq \text {One way}. $$

This means that the collision problem is easier than the Claw(RF) problem which itself is easier than the preimage problem. Moreover, attacking the signature scheme is equivalently hard to solving the Claw(RF) problem in the ROM. In the case of lattices, we have:Footnote 2\(^,\)Footnote 3

$$ Collision \approx \text {SIS} \preccurlyeq \text { Signature } \preccurlyeq \text {One way} = \text {ISIS} \approx \text {SIS}.$$

From the above diagram, we can see that the GPV construction gives for lattices a tight and optimal reduction to the hardness of inversion. This is because this problem is essentially as hard as finding collisions (\(\text {SIS} \approx \text {ISIS}\)). In the context of code-based cryptography, things are very different. In many regimes used for signatures, the collision problem is actually easy to solve. Therefore, we can only use the non-tight reduction to one wayness. From there, there are two possibilities: (1) lose the factor associated to non-tightness and have a big loss in parameters or (2) ignore the non-tightness and assume it won’t have a practical impact. Solution (2) is of course very risky as the security proof of the actual scheme becomes incomplete. On the other hand, those that decide on (1) have a loss in parameters which could be unnecessary. The importance of tightness for security reductions is well shown in the survey paper [KM19]. There has been for example a recent attack [KZ19] on the MQDSS scheme [CHR+16], exploiting non-tightness of the security reduction which wasn’t taken into account.

What we also advocate through our result is that if we want to study the concrete security of these signature schemes in the ROM, the Claw(RF) problem is the actual problem we should be looking at. Others that would want to construct a family of ATPSF for which the collision problem is easier than the preimage problem can use our results to study the real security of their schemes. Because we also prove this optimal security in the quantum ROM, our result is especially adapted for post-quantum cryptography.

In order to prove our results, we use fairly standard techniques in the ROM based on reprogramming the hashing and signing oracles. In order to do this formally, we need to keep track of the internal memory of the reprogrammed oracles - which is often a problem that is discarded in game based reductions - and not only look at their output distributions. We take the approach of constructing more explicitly an algorithm for the Claw(RF) problem from an attacker that attacks the signature scheme instead of using the game formalism, even though we strongly inspire ourselves from this formalism. An interesting aspect of our proof is that we manage to reprogram only the signing oracle, and not the hashing oracle, which reduces the requirement on the family of ATPSF functions.

For the quantum case, our proofs mainly use a result by Zhandry [Zha12] on the indistinguishability of close quantum oracles. Here, we need to reprogram the hash function as well since we cannot work on the internal memory of the quantum oracles. Our proof has some similarities with the one in [BDF+11] where the security of the GPV construction is proven in the asymptotic setting. Our contributions here was to extend this proof to ATPSF, to perform practical security claims and to show a tight reduction to the Claw(RF) problem.

2 Preliminaries

Probabilistic Notation. Let \(\mathcal {D}\) be a distribution, and X be a random variable. The notation \(X \mathop {\hookleftarrow }\limits ^{\$}\mathcal {D}\) denotes that X is distributed according to \(\mathcal {D}\). Furthermore, for a set S, we will denote by \(\mathcal {U}(S)\) the uniform distribution over S. We use the same notations for picking elements: \(y \mathop {\hookleftarrow }\limits ^{\$}\mathcal {D}\) means that y is picked according to \(\mathcal {D}\) while \(y \mathop {\hookleftarrow }\limits ^{\$}S\) denotes that y is uniformly distributed over S.

Sometimes when we wish to emphasize on which probability space the probabilities or the expectations are taken, we note on the right of a symbol “:” the random variable specifying the associated probability space over which the probabilities or expectations are taken. For instance the probability \(\mathbb {P}(\mathcal {E} : X)\) of the event \(\mathcal {E}\) is taken over the random variable X.

The statistical distance between two discrete probability distributions \(\mathcal {D}_1,\mathcal {D}_2\) over a same space \(\mathcal {E}\) is defined as:

$$ \varDelta (\mathcal {D}_0,\mathcal {D}_1) \mathop {=}\limits ^{\triangle }\frac{1}{2} \sum _{x \in \mathcal {E}} |\mathcal {D}_0(x)-\mathcal {D}_1(x) |. $$

The statistical distance \(\varDelta \) satisfies the triangle inequality.

A function \(f(\lambda )\) is said to be negligible, and we denote this by \(f \in \text {negl}(\lambda )\), if for all polynomials \(p(\lambda )\), \(|f(\lambda )| < p(\lambda )^{-1}\) for all sufficiently large \(\lambda \).

For any two sets DR, we denote by \(\mathfrak {F}^D_R\) the set of functions from D to R.

Query Algorithms and Oracles. For any algorithm \(\mathcal {A}\), we denote by \(|\mathcal {A}|\) it’s total running time. We will also consider query algorithms \(\mathcal {A}^\mathcal {O}\) that will make a certain amount of calls to an oracle \(\mathcal {O}\). For us, an oracle \(\mathcal {O}\) will be a deterministic or probabilistic function for which we have only a black box access. When we write \(\mathcal {A}^{\mathcal {O}}\), it will mean that the oracle is non specified and we can replace \(\mathcal {O}\) with any oracle.

For a query algorithm \(\mathcal {A}^{\mathcal {O}}\), we write \(|\mathcal {A}^{\mathcal {O}}| = (t,q)\) indicating that its running time is t and that it performs q queries to \(\mathcal {O}\). Unless specified otherwise, the running time of the oracle \(\mathcal {O}\) is 1. An algorithm can also query different oracles, which we indicate as \(\mathcal {A}^{\mathcal {O}_1,\mathcal {O}_2}\) and \(|\mathcal {A}^{\mathcal {O}_1,\mathcal {O}_2}| = (t,q_1,q_2)\) indicates that it runs in time t and it performs \(q_1\) queries to \(\mathcal {O}_1\) and \(q_2\) queries to \(\mathcal {O}_2\).

For any (deterministic or probabilistic) function f, we denote by \(\mathcal {O}_f\) its associated oracle, and we will write it:

figure a

An important concept in this paper will be the oracles with internal memory. We will denote by \(\mathcal {O}(x;\mathcal {L})\) a query x to oracle \(\mathcal {O}\) which has internal memory \(\mathcal {L}\). If the result to this query is y and the internal memory is changed to \(\mathcal {L}'\), we will write return \((y;\mathcal {L}')\). This internal memory is private and is not part of the public output of the oracle.

One oracle of interest will be the random oracle. It mimics a uniformly chosen random function from \(\mathfrak {F}^D_R\). We will denote this oracle \(\mathcal {O}_{\text {RO}}\) (the sets D and R are implicit).

figure b

This oracle mimics a call to a random function. Each time x is queried, a random image y is constructed. If the same x is called afterwards, the same output y should be given. Therefore, we have a list \(\mathcal {L}\) that stores values (xy) already specified by the function. If \(\mathcal {L}\) is initialized with \(\emptyset \), we should never have \(x,y,y'\ne y\) such that \((x,y) \in \mathcal {L}\) and \((x,y') \in \mathcal {L}\). For any algorithm \(\mathcal {A}^{\mathcal {O}}\), we have:

$$ \mathbb {P}\left( \mathcal {A}^{\mathcal {O}_g} \text { outputs } 0 \mid g \mathop {\hookleftarrow }\limits ^{\$}\mathfrak {F}^D_R\right) = \mathbb {P}\left( \mathcal {A}^{\mathcal {O}_{\text {RO}}} \text { outputs } 0 \mid \mathcal {O}_{\text {RO}} \text { is initialized with } \mathcal {L}= \emptyset \right) . $$

Another important aspect of query algorithms is that if we consider an algorithm \(\mathcal {A}^{\mathcal {O}}\) and two close oracles \(\mathcal {O}_1,\mathcal {O}_2\), then \(\mathcal {A}_{\mathcal {O}_1}\) and \(\mathcal {A}_{\mathcal {O}_2}\) will be close. This is at the core of the game formalism presented for instance in [Sho04]. More formally,

Proposition 1

Let \(\mathcal {A}^{\mathcal {O}}\) be a query algorithm with \(|\mathcal {A}^{\mathcal {O}}| = (t,q)\). Let \(\mathcal {O}_1,\mathcal {O}_2\) be two oracles such that:

$$ \forall x,\mathcal {L}, \quad \varDelta (\mathcal {O}_1(x;\mathcal {L}),\mathcal {O}_2(x;\mathcal {L})) \le \delta . $$

Then we have:

$$ \left| \mathbb {P}\left( \mathcal {A}^{\mathcal {O}_1} \text { outputs } 0\right) - \mathbb {P}\left( \mathcal {A}^{\mathcal {O}_2} \text { outputs } 0\right) \right| \le q\delta . $$

3 Digital Signatures and EUF-CMA Security Model in a Classical/Quantum Setting

A signature scheme S consists of three algorithms \(({\textsc {S.\!keygen}},{\textsc {S.\!sign}},{\textsc {S.\!verify}})\):

  • \({\textsc {S.\!keygen}}(1^\lambda ) \rightarrow (\mathrm {pk},\mathrm {sk})\) is the generation of the public key \(\mathrm {pk}\) and the secret key \(\mathrm {sk}\) from the security parameter \(\lambda \).

  • \({\textsc {S.\!sign}}(m,\mathrm {pk},\mathrm {sk}) \rightarrow \sigma _m\): generates the signature \(\sigma _m\) of a message m from \(m,\mathrm {pk},\mathrm {sk}\).

  • \({\textsc {S.\!verify}}(m,\sigma ,\mathrm {pk}) \rightarrow \{0,1\}\) verifies that \(\sigma \) is a valid signature of m using \(m,\sigma ,\mathrm {pk}\). The output 1 corresponds to a valid signature.

Correctness. A signature scheme is defined as correct if when we sample \((\mathrm {pk},\mathrm {sk}) \leftarrow {\textsc {S.\!keygen}}(1^\lambda )\), we have for each m:

$$ {\textsc {S.\!verify}}(m,{\textsc {S.\!sign}}(m,\mathrm {pk},\mathrm {sk}),\mathrm {pk}) = 1. $$

Security Definitions. We consider the EUF-CMA (Existential Universal Forgery for Chosen Message Attack) security for signature schemes. A key pair \((\mathrm {pk},\mathrm {sk}) \leftarrow {\textsc {S.\!keygen}}(1^{\lambda })\) is generated. The goal of the adversary \(\mathcal {A}\) is, knowing only \(\mathrm {pk}\), to construct a pair \((m,\sigma _m)\) such that \(\sigma _m\) is a valid signature for m but we give him some additional power. He can query a signing oracle \(\mathcal {O}_{\mathsf {Sign}}\), that does the following:

figure c

Notice here that the signing oracle has access to \(\mathrm {pk}\) and \(\mathrm {sk}\). The goal of the adversary is then in this case to output a valid signature \(\sigma _{m^*}\) for a message \(m^*\) that has not been queried to the signing oracle.

Definition 1

Let \(\mathcal {A}^{\mathcal {O}}\) be a query algorithm, we define

$$\begin{aligned}&Adv_{\mathcal {S}}^{\mathrm {EUF}\hbox {-}\mathrm {CMA}}(\mathcal {A}^\mathcal {O}) = \mathbb {P}\Big ({\textsc {S.\!verify}}(m^*,\sigma ^*,\mathrm {pk}) = 1 \textsc { and }\, m^* \text { has not} \qquad \quad \;\; \\&\text { been queried in } \mathcal {O}_{\mathsf {Sign}} : (\mathrm {pk},\mathrm {sk}) \leftarrow {\textsc {S.\!keygen}}(1^\lambda ), (m^*,\sigma ^*) \leftarrow \mathcal {A}^{\mathcal {O}_{\mathsf {Sign}}}(\mathrm {pk}) \Big ). \end{aligned}$$

For any time t and number of queries \(q_{sign}\), we define:

$$ Adv_{\mathcal {S}}^{\mathrm {EUF}\hbox {-}\mathrm {CMA}}(t,q_{sign}) = \max _{\begin{array}{c} \mathcal {A}^{\mathcal {O}} : |\mathcal {A}^{\mathcal {O}}| = (t,q_{sign}) \end{array}} Adv_{\mathcal {S}}^{\mathrm {EUF}\hbox {-}\mathrm {CMA}}(\mathcal {A}^{\mathcal {O}}). $$

For a quantum adversary, we define similarly the quantum EUF-CMA advantage as:

$$ QAdv_{\mathcal {S}}^{\text {EUF-CMA}}(t,q_{sign}) = \max _{\begin{array}{c} \mathcal {A}^{\mathcal {O}} : |\mathcal {A}^{\mathcal {O}}| = (t,q_{sign}) \end{array}} Adv_{\mathcal {S}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}}). $$

where the maximum is over quantum query algorithms that perform classical queries to \(\mathcal {O}_{\mathsf {Sign}}\).

It is actually standard, even if the algorithm is quantum, to consider classical queries to the signing oracle. This is because in the real life scenario that motivates this security definition, signing queries are done to an external party that can force you to perform classical queries. In the post-quantum standardization process, the NIST indeed requires only security against classical queries to the signing oracle.

4 Family of ATPSF

In this work we will use the Full Domain Hash (FDH) paradigm of signature schemes [BR96, Cor02]. The key ingredient of this kind of constructions is a trapdoor one-way function \(f:D \rightarrow R\) and a cryptographic hash function \(\mathcal {H}\). The corresponding FDH scheme to sign a message m uses the trapdoor to choose a signature \(x \in f^{-1}(\mathcal {H}(m))\). The verification step simply consists in computing \(\mathcal {H}(m)\) and f(x) to ensure that \(f(x)= \mathcal {H}(m)\). The difficulty for designing such primitives is the fact that each time a message is signed, the signature is made public while the secret trapdoor has been used to produce it. Therefore, we must ensure that no information of the trapdoor leaks after the inversion. However, in the nice case where f is a permutation this does not matter. Indeed, the hash of the message \(\mathcal {H}(m)\) is classically considered as random and thus the inverse \(x = f^{-1}(\mathcal {H}(m))\) will be random too and in this way distributed independently of the trapdoor. This is typically the case for signatures schemes like RSA. Nevertheless, building one-way permutations in the post-quantum world like in code/lattice-based cryptography is a hard condition to meet. Usually [GPV08, DST19b] functions are many-to-one and then it is non-trivial to build trapdoor candidates with an inversion algorithm which is oblivious to the used trapdoor. Building a secure FDH signature in this situation can be achieved by imposing additional properties [GPV08] to the one-way function. This is mostly captured by the notion of Trapdoor Preimage Sampleable Functions (TPSF) [GPV08, Definition 5.3.1]. We express below this concept in a slightly relaxed way dropping the domain sampleability condition and only assuming that the preimage sampleable property holds on average and not for any possible element in the function range. This will be sufficient for proving the security of the associated FDH scheme.

Definition 2

An \(\varepsilon \)-ATPSF (for Average Trapdoor Preimage Sampleable Functions (or Function Family)) is an efficient triplet of probabilistic algorithms (\(\textsc {TrapGen}\), \(\textsc {SampDom}\), \(\textsc {SampPre}\)) where:

  • \(\textsc {TrapGen}(1^\lambda ) \rightarrow (f,\tau )\). Takes the security parameter \(\lambda \) and outputs \(f:D_\lambda \rightarrow R_\lambda \), an efficiently computable function with an efficient description, and \(\tau \), the trapdoor that will allow to invert f.

  • \(\textsc {SampDom}(f) \rightarrow x\). Takes a function \(f:D_\lambda \rightarrow R_\lambda \) (with an efficient description) as an input and outputs some \(x \in D_\lambda \).

  • \(\textsc {SampPre}(f,\tau ,y) \rightarrow x\). Takes a function f with associated trapdoor \(\tau \), an element \(y \in R_\lambda \) and outputs \(x \in D_\lambda \) s.t. \(f(x) = y\).

We define:

$$ \varepsilon _{f,\tau } \mathop {=}\limits ^{\triangle }\varDelta \Big (\textsc {SampDom}(f),\textsc {SampPre}(f,\tau ,U(R_\lambda ))\Big ),$$

where \(\textsc {SampPre}(f,\tau ,U(R_\lambda ))\) is sampled as follows: pick \(y \mathop {\hookleftarrow }\limits ^{\$}R_\lambda \), return \(\textsc {SampPre}(f,\tau ,y)\). We require that our triplet of algorithms satisfies

$$\begin{aligned} \mathbb {E}_{(f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda )} \left( \varepsilon _{f,\tau } \right) \le \varepsilon .\end{aligned}$$
(1)

The main difference with the definition of TPSF as defined [GPV08, Definition 5.3.1] is that we consider an average \(y \mathop {\hookleftarrow }\limits ^{\$}R_\lambda \) instead of wanting the property for almost all y. Furthermore, it is also asked for TPSF to verify that \(\mathbb {E}_{(f,t) \leftarrow \textsc {TrapGen}(1^\lambda )}\left( \varDelta (f(\textsc {SampDom}(f)),U(R_\lambda ))\right) \le \varepsilon '\) (domain sampleability condition) for some \(\varepsilon '\) whereas we, a priori, don’t request anything of this kind for ATPSF. We show now that \(\varepsilon \)-ATPSF family verifies the domain sampleability condition of [GPV08].

Proposition 2

Let \(\mathcal {F}= (\textsc {TrapGen}\), \(\textsc {SampDom}\), \(\textsc {SampPre}\)) be a collection of \(\varepsilon \)-ATPSF. We have for any \(f,\tau \)

$$\begin{aligned} \varDelta (f(\textsc {SampDom}(f)),U(R_\lambda )) \le \varepsilon _{f,\tau } \end{aligned}$$
(2)

where for a fixed f, \(f(\textsc {SampDom}(f))\) is the distribution which is sampled as follows: \(x \leftarrow \textsc {SampDom}(f),\) return f(x). Furthermore,

$$\begin{aligned} \mathbb {E}_{(f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda )}\left[ \varDelta (f(\textsc {SampDom}(f)),U(R_\lambda ))\right] \le \varepsilon \end{aligned}$$
(3)

Proof

We write

$$\begin{aligned} \varepsilon _{f,\tau }&= \varDelta \Big (\textsc {SampDom}(f),\textsc {SampPre}(f,\tau ,U(R_\lambda ))\Big ) \\&\ge \varDelta \Big (f(\textsc {SampDom}(f)),f(\textsc {SampPre}(f,\tau ,U(R_\lambda )))\Big ) \\&=\varDelta (f(\textsc {SampDom}(f)),U(R_\lambda )) \end{aligned}$$

where the first inequality uses the fact that for any deterministic function f and random variables X and Y (see [GM02] for a proof), \(\varDelta (f(X),f(Y)) \le \varDelta (X,Y)\). This proves Eq. (2). We conclude the proof by taking the expectation over \((f,t) \leftarrow \textsc {TrapGen}(1^\lambda )\).    \(\square \)

We now also show that we can replace the average property of the ATPSF with one that works for almost all y, but with a root loss in the sampleability error \(\varepsilon \).

Proposition 3

([S19]). Let \(\mathcal {F}= (\textsc {TrapGen}\), \(\textsc {SampDom}\), \(\textsc {SampPre}\)) be an \(\varepsilon \)-ATPSF and for \((f,\tau )\) output by \(\textsc {TrapGen}(1^\lambda )\). We have

$$ \frac{1}{|R_\lambda |} \cdot \# \left\{ y \in R_{\lambda } : \varDelta \left( \textsc {SampPre}(f,y),X_{y} \right) > \sqrt{\varepsilon _{f,\tau }} \right\} \le 2\sqrt{\varepsilon _{f,\tau }}, $$

where \(X_{y}\) denotes the distribution of \(X_{y}\mathop {\hookleftarrow }\limits ^{\$}\textsc {SampDom}(f)\) given \(f(X_{y}) = y\), meaning

$$\begin{aligned} \forall x \in D_{\lambda }, \quad \mathbb {P}\left( X_{y} = x \right) \mathop {=}\limits ^{\triangle }\mathbb {P}\left( \textsc {SampDom}(f) = x \mid f(\textsc {SampDom}(f)) = y\right) . \end{aligned}$$
(4)

Proof

The first part of the proof is to prove the following equation:

$$\begin{aligned} 2\varepsilon _{f,\tau } \ge \frac{1}{|R_{\lambda }|}\sum _{y\in R_{\lambda }}\varDelta \left( \textsc {SampPre}(f,y),X_{y} \right) \end{aligned}$$
(5)

Let us denote for all \(y\in R_{\lambda }\),

$$ p_{y}\mathop {=}\limits ^{\triangle }\mathbb {P}\left( \textsc {SampDom}(f) = x \mid f(\textsc {SampDom}(f)) = y\right) . $$

We have the following computation,

$$\begin{aligned} \varepsilon _{f,\tau }&\!=\varDelta (\textsc {SampPre}(f,t,\mathcal {U}(R_{\lambda })),\textsc {SampDom}(f) )\nonumber \\&\!=\frac{1}{2}\sum _{y}\sum _{x \in f^{-1}(y)} \left| \mathbb {P}\left( \textsc {SampDom}(f) = x\right) - \frac{1}{|R_{\lambda }|} \mathbb {P}(\textsc {SampPre}(f,\tau ,y)=x) \right| \nonumber \\&\!=\frac{1}{2}\sum _{y}\sum _{x \in f^{-1}(y)} \left| \mathbb {P}\left( \textsc {SampDom}(f) = x\right) - \frac{p_{y}}{|R_{\lambda }|} +\frac{p_{y}}{|R_{\lambda }|}-\frac{1}{|R_{\lambda }|} \mathbb {P}(\textsc {SampPre}(f,\tau ,y)=x) \right| \nonumber \\&\!\ge \frac{1}{2}\sum _{y} \frac{1}{|R_{\lambda }|} \sum _{x \in f^{-1}(y)}\! \left| p_{y}- \mathbb {P}(\textsc {SampPre}(f,\tau ,y)=x)\right| \! -\! \frac{1}{2}\sum _{y}\sum _{x\in f^{-1}(y)}p_y \left| \frac{\mathbb {P}\left( \textsc {SampDom}(f)\! =\! x\right) }{p_{y}}\! -\! \frac{1}{|R_{\lambda }|} \right| \nonumber \\&\!= \sum _{y \in R_{\lambda }} \frac{1}{|R_{\lambda }|} \varDelta \left( \textsc {SampPre}(f,\tau ,y),X_y \right) - \frac{1}{2}\sum _{y}\sum _{x\in f^{-1}(y)}p_y \left| \frac{\mathbb {P}\left( \textsc {SampDom}(f) = x\right) }{p_{y}} - \frac{1}{|R_{\lambda }|} \right| \end{aligned}$$
(6)

Now we have for all \(x\in f^{-1}(y)\) and by definition of \(p_{y}\),

$$\begin{aligned} \frac{\mathbb {P}\left( \textsc {SampDom}(f) = x\right) }{p_{y}}&= \frac{\mathbb {P}\left( \textsc {SampDom}(f) = x\right) }{\mathbb {P}\left( \textsc {SampDom}(f) = x \mid f(\textsc {SampDom}(f)) = y\right) }\nonumber \\&= \frac{\mathbb {P}\left( \textsc {SampDom}(f) = x\right) }{\mathbb {P}\left( f(\textsc {SampDom}(f)) = y \mid \textsc {SampDom}(f) = x \right) \frac{\mathbb {P}(\textsc {SampDom}(f)\,=\,x)}{\mathbb {P}(f(\textsc {SampDom}(f))\,=\,y)}} \nonumber \\&= \frac{\mathbb {P}(f(\textsc {SampDom}(f)) = y)}{\mathbb {P}\left( f(\textsc {SampDom}(f)) = y \mid \textsc {SampDom}(f)\,=\,x \right) } \nonumber \\&= \mathbb {P}(f(\textsc {SampDom}(f))\,=\,y) \end{aligned}$$
(7)

where in the last line we used the fact that \(f(x)\,=\,y\). Therefore, by putting (7) in (6) and using that \(\sum _{x\in f^{-1}(y)} p_y =1\) we get,

$$\begin{aligned} \varepsilon _{f,\tau }&\ge \sum _{y \in R_{\lambda }} \frac{1}{|R_{\lambda }|} \varDelta \left( \textsc {SampPre}(f,\tau ,y),X_y \right) - \varDelta (f(\textsc {SampDom}(f)),\mathcal {U}(R_{\lambda })) \\&\ge \sum _{y \in R_{\lambda }} \frac{1}{|R_{\lambda }|} \varDelta \left( \textsc {SampPre}(f,\tau ,y),X_y \right) - \varepsilon _{f,\tau }. \end{aligned}$$

where the last inequality comes from Proposition 2. This proves Eq. (5). In order to conclude, we write

$$ \sum _{y \in R_{\lambda }} \frac{1}{|R_{\lambda }|} \varDelta \left( \textsc {SampPre}(f,\tau ,y),X_y \right) \ge \frac{\sqrt{\varepsilon _{f,\tau }}}{|R_\lambda |} \cdot \# \left\{ y \in R_{\lambda } : \varDelta \left( \textsc {SampPre}(f,y),X_{y} \right) > \sqrt{\varepsilon _{f,\tau }} \right\} .$$

Plugging this into Eq. 5, we get the desired result.    \(\square \)

4.1 Constructing a Signature Scheme from ATPSF

As pointed out in [S19], the fact that a collection of ATPSF verifies the preimage property for almost all inputs is enough to build a signature scheme as in [GPV08] and to use the security reduction given in [GPV08, Proposition 6.1]. Nevertheless, by doing this we loose a square factor. We propose here to generalize the construction of [GPV08] by adding a random salt in the signing algorithm. More precisely, given a collection an ATPSF \(\mathcal {F}= (\textsc {TrapGen}\), \(\textsc {SampDom}\), \(\textsc {SampPre}\)) we define the following Full Domain Hash signature scheme \(\textsc {S}^{\mathcal {F}}\): select a cryptographic hash function \({\mathcal {H}}: \{0,1\}^{*} \rightarrow R_{\lambda }\) and a random salt r of size \(\lambda _{0}\) (\(\lambda _0\) will be precised later). Consider the following three algorithms of the signature \(\textsc {S}^{\mathcal {F}}\):

figure d

Our aim in what follows is to give a tight security reduction of this scheme using directly the average property of ATPS. In order to do so, we must first define different computational problems to reduce and in particular we introduce our Claw(RF) problem. This is the aim of the following section.

The Random Oracle Model (ROM) in This Construction. In the random oracle model, we replace the function \(\mathcal {H}\) with a random function \(h : \{0,1\}^{*} \times \{0,1\}^{\lambda _0} \rightarrow R_\lambda \) to which we only give black box access. Recall the EUF-CMA advantage of \(\textsc {S}^{\mathcal {F}}\):

$$\begin{aligned}Adv_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Sign}}}) = \mathbb {P}\Big (\mathcal {H}(m^*,r^*) = f(x^*) \textsc { and } m^* \text { has not been } \qquad \quad \\ \text { queried in } \mathcal {O}_{\mathsf {Sign}} : (\mathrm {pk},\mathrm {sk}) \leftarrow {\textsc {S.\!keygen}}(1^\lambda ), (m^*,r^*,x^*) \leftarrow \mathcal {A}^{\mathcal {O}_{\mathsf {Sign}}}(\mathrm {pk}) \Big ) \end{aligned}$$

where \(\mathcal {O}_{\mathsf {Sign}}\) is the oracle defined in Sect. 3. The ROM assumption says that any algorithm can only use \(\mathcal {H}\) in a black box fashion and that it behaves as a random function. This translates to the fact that \(\mathcal {A}\) can be seen as query algorithm not only to the signing oracle but also to the \(\mathcal {H}\) function and that the EUF-CMA advantage is equal to the following one:

$$\begin{aligned} \quad \mathbb {P}\Big (h(m^*,r^*) = f(x^*) \textsc { and } m^* \text { has not been } \text {queried in } \mathcal {O}_{\mathsf {Sign}} : h \mathop {\hookleftarrow }\limits ^{\$}\mathfrak {F}^{\{0,1\}^* \times \{0,1\}^{\lambda _0}}_{R_\lambda } \\ (\mathrm {pk},\mathrm {sk}) \leftarrow {\textsc {S.\!keygen}}(1^\lambda ), (m^*,r^*,x^*) \leftarrow \mathcal {A}^{\mathcal {O}_{\mathsf {Sign}},\mathcal {O}_h}(\mathrm {pk}) \Big ). \qquad \;\; \end{aligned}$$

5 One-Wayness, Collision Resistance and the Claw with Random Function Problem

The interest in using trapdoor functions for signatures is that these functions should be hard to invert without the trapdoor \(\tau \). Ideally, we want to reduce the security of the signature scheme to the hardness of inverting the function. However, this is not always possible and we have to reduce the security to other problems.

5.1 Definitions

We first present the notion of advantage related to one-wayness and collision finding. We then define our Claw(RF) problem and the associated advantage.

Definition 3

Let \(\mathcal {F}\) \(= (\textsc {TrapGen}\), \(\textsc {SampDom}\), \(\textsc {SampPre})\) be an ATPSF. For any algorithm \(\mathcal {A}\), we define:

$$\begin{aligned} Adv_{\mathcal {F}}^{OW}(\mathcal {A})&\mathop {=}\limits ^{\triangle }\mathbb {P}\left( f(x) = y : (f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ) , \ y \mathop {\hookleftarrow }\limits ^{\$}R_\lambda , \ x \leftarrow \mathcal {A}(f,y) \right) , \\ Adv_{\mathcal {F}}^{Coll}(\mathcal {A})&\mathop {=}\limits ^{\triangle }\mathbb {P}\left( f(x_1) = f(x_2) \wedge x_1 \ne x_2 : (f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ) , \ (x_1,x_2) \leftarrow \mathcal {A}(f) \right) . \end{aligned}$$

For any time t, we also define

$$\begin{aligned} Adv_{\mathcal {F}}^{OW}(t)&\mathop {=}\limits ^{\triangle }\max _{\mathcal {A}: |\mathcal {A}| = t} Adv_{\mathcal {F}}^{OW}(\mathcal {A}), \\ Adv_{\mathcal {F}}^{Coll}(t)&\mathop {=}\limits ^{\triangle }\max _{\mathcal {A}: |\mathcal {A}| = t} Adv_{\mathcal {F}}^{Coll}(\mathcal {A}). \end{aligned}$$

Now, we define the Claw(RF) problem.

figure e

From there, we define the Claw(RF) advantage for any query algorithm \(\mathcal {A}^{\mathcal {O}}\).

Definition 4

Let \(\mathcal {F}\) \(= (\textsc {TrapGen}\), \(\textsc {SampDom}\), \(\textsc {SampPre})\) be an ATPRF.

$$\begin{aligned}Adv_{\mathcal {F}}^{Claw(RF)}(\mathcal {A}^{\mathcal {O}})&\mathop {=}\limits ^{\triangle }\mathbb {P}\left( f(x) = h(y) : h \mathop {\hookleftarrow }\limits ^{\$}\mathfrak {F}^D_R, \ (f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ) , \ (x,y) \leftarrow \mathcal {A}^{\mathcal {O}_h}(f)\right) \\&= \mathbb {P}\left( f(x) = \mathcal {O}_{\text {RO}}(y) : (f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ) , \ (x,y) \leftarrow \mathcal {A}^{\mathcal {O}_{\text {RO}}}(f)\right) \end{aligned}$$

For any time t and any number of queries q, we also define

$$\begin{aligned} Adv_{\mathcal {F}}^{Claw(RF)}(t,q)&\mathop {=}\limits ^{\triangle }\max _{\mathcal {A}^\mathcal {O}: |\mathcal {A}^\mathcal {O}| = (t,q)} Adv_{\mathcal {F}}^{Claw(RF)}(\mathcal {A}^{\mathcal {O}}). \end{aligned}$$

Similarly, if we consider quantum algorithms, we can define the quantum advantages \(QAdv_{\mathcal {F}}^{OW}(t),QAdv_{\mathcal {F}}^{Coll}(t)\) and \( QAdv_{\mathcal {F}}^{Claw(RF)}(t,q)\) where we maximize over quantum query algorithms. In the case of \(QAdv_{\mathcal {F}}^{Claw(RF)}(t,q)\), we allow quantum queries to \(\mathcal {O}_h\).

5.2 Relating These Different Advantages

In this section, we present the relationship between the different advantages.

Proposition 4

Let \(\mathcal {F}\) be an \(\varepsilon \)-ATPRF For any time t, we have

$$\begin{aligned} Adv_{\mathcal {F}}^{OW}(t)&\le Adv_{\mathcal {F}}^{Claw(RF)}(t,1) \\ \\ Adv_{\mathcal {F}}^{Claw(RF)}(t,q)&\le q\cdot Adv_{\mathcal {F}}^{OW}(t) \\ Adv_{\mathcal {F}}^{Claw(RF)}(t,q)&\le Adv_{\mathcal {F}}^{Coll}(t + \widetilde{O}(q)) + q \varepsilon + \mathbb {E}_{(f,t) \leftarrow \textsc {TrapGen}}\left( \frac{1}{\mathrm {MNP}(f)} \right) \end{aligned}$$

where for \((f,\tau )\leftarrow \textsc {TrapGen}(1^{\lambda })\), the minimal number of preimages \(\mathrm {MNP}(f)\) is

$$ \mathrm {MNP}(f) \mathop {=}\limits ^{\triangle }\min _{y}\left( |\{x : f(x) = y|\}\right) . $$

Proof

We prove each inequality separately.

\(\mathbf {1.} \ {Adv_{\mathcal {F}}^{OW}(t) \le Adv_{\mathcal {F}}^{Claw(RF)}(t,1)}\).

Let \(\mathcal {A}\) be an algorithm running in time t with one-way advantage \(Adv_{\mathcal {F}}^{OW}(t)\). We consider the following algorithm

$$ \mathcal {B}^{\mathcal {O}_g}(f) \text{: } x_2 \mathop {\hookleftarrow }\limits ^{\$}D, y \mathop {=}\limits ^{\triangle }g(x_2), x_1 \leftarrow \mathcal {A}(f,y), \text{ return } (x_1,x_2). $$

For a random g whose inputs are in D, y is a uniform element in \(R_\lambda \). Moreover, since \(f(x_1) = g(x_2)\) is equivalent to \(f(x_1) = y\), we have \( Adv_{\mathcal {F}}^{OW}(\mathcal {A}) \le Adv_{\mathcal {F}}^{Claw(RF)}(\mathcal {B}^{\mathcal {O}_g}).\) Finally notice that \(\mathcal {B}^{\mathcal {O}_g}\) makes a single call to g and runs in the same time as \(\mathcal {A}\), which concludes the proof.    \(\square \)

\(\mathbf {2.} \ Adv_{\mathcal {F}}^{Claw(RF)}(t,q) \le q\cdot Adv_{\mathcal {F}}^{OW}(t).\)

Let \(\mathcal {A}^\mathcal {O}\) be a query algorithm running in time t, performing q queries to \(\mathcal {O}\) with Claw(RF) advantage \(Adv_{\mathcal {F}}^{Claw(RF)}(t,q)\). Let \(\mathcal {O}_{\text {RO}}(x;\mathcal {L})\) be the random oracle. We construct a new procedure \(\mathcal {O}{''}_{j,y_0}\) which is equivalent to \(\mathcal {O}_{\text {RO}}\) except the \(j^{\text {th}}\) call that outputs \(y_0\). In the internal memory of \(\mathcal {O}{''}_{j,y_0}\), we will keep track in a index i that corresponds to the number of times the oracle was queried \(+ 1\).

figure f

Notice that if j and \(y_0\) are chosen at random then this doesn’t change the behavior of the oracle. We consider the following algorithm

$$ \mathcal {B}(f,y_0)\text{: } j \mathop {\hookleftarrow }\limits ^{\$}\{1,\dots ,q\}, (x_1,x_2) \leftarrow \mathcal {A}^{\mathcal {O}^{''}_{j,y_0}}, \text{ return } x_1 $$

Notice that we have replaced in \(\mathcal {A}\) calls to \(\mathcal {O}_{\text {RO}}\) with calls to \(\mathcal {O}^{''}_{j,y_0}\). We write

$$\begin{aligned}&Adv_{\mathcal {F}}^{Claw(RF)}(\mathcal {A}^{\mathcal {O}}) \\&= \mathbb {P}\left( f(x_1) = \mathcal {O}_{\text {RO}}(x_2) : (f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ) , \ (x_1,x_2) \leftarrow \mathcal {A}^{\mathcal {O}_{\text {RO}}}(f)\right) \\&= \mathbb {P}\Big [f(x_1) = \mathcal {O}{''}_{j,y}(x_2) : (f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ), j \mathop {\hookleftarrow }\limits ^{\$}\{1,\dots ,q\}, \ y \mathop {\hookleftarrow }\limits ^{\$}R_\lambda , \ (x_1,x_2) \leftarrow \mathcal {A}^{\mathcal {O}^{''}_{j,y}}(f)\Big ] \\&\ge \frac{1}{q}\mathbb {P}\Big [f(x_1) = y : (f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ), \ j \mathop {\hookleftarrow }\limits ^{\$}\{1,\dots ,q\}, \ y \mathop {\hookleftarrow }\limits ^{\$}R_\lambda , \ (x_1,x_2) \leftarrow \mathcal {A}^{\mathcal {O}^{''}_{j,y}}(f)\Big ] \\&= \frac{1}{q}\mathbb {P}\Big [f(x_1) = y : (f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ), \ y \mathop {\hookleftarrow }\limits ^{\$}R_\lambda , \ x_1 \leftarrow \mathcal {B}(f,y)\Big ] \end{aligned}$$

where the inequality comes from the fact that when \(x_2\) is queried in \(\mathcal {O}^{''}_{j,y}(x_2)\), there is a probability of \(\frac{1}{q}\) that this corresponds to the \(j^{\text {th}}\) query on average on j which corresponds to \(\mathcal {O}^{''}_{j,y}(x_2) = y\).    \(\square \)

\(\mathbf {3.} \ {Adv_{\mathcal {F}}^{Claw(RF)}(t,q) \le Adv_{\mathcal {F}}^{Coll}(t + \widetilde{O}(q)) + q \varepsilon + \mathbb {E}_{(f,t) \leftarrow \textsc {TrapGen}}\left( \frac{1}{\mathrm {MNP}(f)}\right) }\).

Let \(\mathcal {A}^\mathcal {O}\) be a query algorithm running in time t, performing q queries to \(\mathcal {O}\) with Claw(RF) advantage \(Adv_{\mathcal {F}}^{Claw(RF)}(t,q)\). We use the random oracle \(\mathcal {O}_{\text {RO}}\) and write

$$\begin{aligned} Adv_{\mathcal {F}}^{Claw(RF)}(\mathcal {A}^{\mathcal {O}}) = \mathbb {P}\Big (f(x_1) = \mathcal {O}_{\text {RO}}(x_2) : (f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ) , \ (x_1,x_2) \leftarrow \mathcal {A}^{\mathcal {O}_{\text {RO}}}(f)\big ) \end{aligned}$$
(8)

We now define another procedure \(\mathcal {O}'_f\) that is similar to \(\mathcal {O}_{\text {RO}}\) but we change the way y is sampled.

figure g

Therefore we have:

$$\begin{aligned} \forall x,\mathcal {L}, \quad \ \varDelta \left( \mathcal {O}_{\text {RO}}(x;\mathcal {L}),\mathcal {O}'_f(x;\mathcal {L})\right) \le \varepsilon \end{aligned}$$
(9)

from Proposition 2.

We now consider the following (queryless) algorithm \(\mathcal {B}\): run \(\mathcal {A}^\mathcal {O}\). Each time \(\mathcal {O}\) is called, run \(\mathcal {O}'_f\) and keep track efficiently of the internal memory \(\mathcal {L}\), with a sorted list. Initialize \(\mathcal {L}= \emptyset \). The list \(\mathcal {L}\) is of size at most q so each membership query to \(\mathcal {L}\) can be done in time at most \(O(\log (q))\), so \(\mathcal {B}\) runs in time \(t + \widetilde{O}(q)\). Moreover, since \(\mathcal {O}'_f\) is called q times, using Eqs. (8), (9) and Proposition 1, we have

$$\begin{aligned} Adv_{\mathcal {F}}^{Claw(RF)}(\mathcal {A}^{\mathcal {O}}) \le \mathbb {P}\Big (f(x_1) = \mathcal {O}'_f(x_2) : (f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ) , \ (x_1,x_2) \leftarrow \mathcal {B}(f)\Big ) + q\varepsilon . \end{aligned}$$

Now, we construct the following algorithm \(\mathcal {C}\): run \(\mathcal {B}\). Each time \(\mathcal {O}'_f(x;\mathcal {L})\) is called, keep track of the value z such that \(f(z) = \mathcal {O}'_f(x)\). Let \(x_1,x_2\) be the output of \(\mathcal {B}(f)\). Let z such that \(\mathcal {O}'_f(x_2) = f(z)\). Output \((x_1,z)\). Again, \(\mathcal {C}\) runs in time \(t + \widetilde{O}(q)\). We have

$$\begin{aligned} Adv_{\mathcal {F}}^{Claw(RF)}(\mathcal {A}^{\mathcal {O}}) \le \mathbb {P}\Big (f(x_1) =f(z) : (f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ) , \ (x_1,z) \leftarrow \mathcal {C}(f)\Big ) + q\varepsilon . \end{aligned}$$

In order to relate this to the collision advantage, we just need to find the probability that \(x_1 \ne z\) in the above. From the construction of \(\mathcal {C}\) and \(\mathcal {O}_{f'}\), we have that z is a random preimage of \(f(x_1)\). Therefore, \(x_1 \ne z\) with probability at least \(1 - \frac{1}{\mathrm {MNP}(f)}\)Footnote 4. From there, we can conclude

$$ Adv_{\mathcal {F}}^{Claw(RF)}(\mathcal {A}^{\mathcal {O}}) \le Adv_{\mathcal {F}}^{Coll}(\mathcal {C}) + q \varepsilon + \mathbb {E}_{(f,t) \leftarrow \textsc {TrapGen}}\left( \frac{1}{\mathrm {MNP}(f)} \right) . $$

   \(\square \)

6 Tight Reduction to the Claw Problem, with ATPSF

6.1 Proof of Our Main Theorem

Theorem 1

Let \(\mathcal {F}= (\textsc {TrapGen},\textsc {SampDom},\textsc {SampPre})\) be a collection of \(\varepsilon \)-ATPSF with security parameter \(\lambda \). Let \(\textsc {S}^{\mathcal {F}}\) be the associated Hash and Sign signature scheme with salt size \(\lambda _0\). For any \(t,q_{\text {hash}},q_{\text {sign}}\), we have

$$ Adv_{\textsc {S}^{\mathcal {F}}}^{\mathrm {EUF}\hbox {-}\mathrm {CMA}}(t,q_{\text {hash}},q_{\text {sign}}) \le Adv_{\mathcal {F}}^{Claw(RF)}(\widetilde{O}(t),q_{\text {hash}})\,+\,q_{\text {sign}}\left( \varepsilon + \frac{(q_{sign} + q_{hash})}{2^{\lambda _0}}\right) $$

and by taking \(\lambda _0 = \lambda + 2\log (q_{\text {sign}}) + \log (q_{\text {hash}})\), we have

$$ Adv_{\textsc {S}^{\mathcal {F}}}^{\mathrm {EUF}\hbox {-}\mathrm {CMA}}(t,q_{\text {hash}},q_{\text {sign}}) \le Adv_{\mathcal {F}}^{Claw(RF)}(\widetilde{O}(t),q_{\text {hash}}) + q_{\text {sign}}\varepsilon + \frac{1}{2^{\lambda }}. $$

Proof

Let \(\mathcal {A}^{\mathcal {O}_\mathsf {Hash},\mathcal {O}_\mathsf {Sign}}\) be an attacker with \(|\mathcal {A}^{\mathcal {O}_\mathsf {Hash},\mathcal {O}_\mathsf {Sign}}| = (t,q_{\text {hash}},q_{\text {sign}})\) such that \(Adv_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(t,q_{\text {hash}},q_{\text {sign}}) = Adv_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_\mathsf {Hash},\mathcal {O}_\mathsf {Sign}})\). We show how to construct a query algorithm \(\mathcal {C}^{\mathcal {O}}\) to attack the claw with random function property of \(\mathcal {F}\). In the signature scheme \(\textsc {S}^{\mathcal {F}}\), we have the following hash and sign procedures, where the Hash procedure is the Random Oracle.

figure h

Recall that \(\mathcal {L}\) corresponds to the list of input/output pairs already queried to the \(\mathsf {Hash}\) function. Here, both procedures use the same \(\mathcal {L}\) and each time is it updated, this update happens for both procedures at the same time. We first rewrite the \(\mathsf {Sign}\) procedure by replacing the \(\mathsf {Hash}\) procedure inside it with its explicit code:

figure i

Now, we present a new signature procedure \(\mathsf {Sign}'\), that will be close to \(\mathsf {Sign}\) but doesn’t use \(\tau \).

figure j

We made two changes from \(\mathsf {Sign}\) to \(\mathsf {Sign}'\). In the case where \(\exists ! y_0, : (m||r,y_0) \in \mathcal {L}\), \(\mathsf {Sign}'\) outputs \(\bot \). In the other case, \(\mathsf {Sign}'\) also has a different way of sampling x and y. We show that these two changes do not change a lot the output distribution of the sign procedure:

Lemma 1

For any \(f,\tau \) as well as m and \(\mathcal {L}\), we have \(\varDelta \big (\mathsf {Sign}(m;\mathcal {L}),\mathsf {Sign}'(m;\mathcal {L})\big ) \le \varepsilon _{f,\tau } + \frac{|\mathcal {L}|}{2^{\lambda _0}}.\)

Proof

We consider the following intermediate procedure \(\mathsf {Sign}_{int}\)

figure k

\(\mathsf {Sign}(m;\mathcal {L})\) and \(\mathsf {Sign}_{int}(m;\mathcal {L})\) only differ when for the random choice \(r \mathop {\hookleftarrow }\limits ^{\$}\{0,1\}^{\lambda _0}\), \(\exists ! y_0: (m||r,y_0) \in \mathcal {L}\). This event happens with probability at most \(\frac{|\mathcal {L}|}{2^{\lambda _0}}\) hence \(\varDelta (\mathsf {Sign}(m;\mathcal {L}),\mathsf {Sign}_{int}(m;\mathcal {L})) \le \frac{|\mathcal {L}|}{2^{\lambda _0}}.\)

Now, let’s look at the distance between \(\mathsf {Sign}_{int}(m;\mathcal {L})\) and \(\mathsf {Sign}'(m;\mathcal {L})\). The only difference in those distributions comes from the way x and y are sampled. Since both in \(\mathsf {Sign}_{int}\) and \(\mathsf {Sign}'\), we have \(y = f(x)\) (and f is deterministic), the only difference comes from the way x is sampled. Therefore,

$$\begin{aligned} \varDelta \left( \mathsf {Sign}_{int}(m;\mathcal {L}),\mathsf {Sign}'(m;\mathcal {L})\right) = \varDelta \left( \textsc {SampPre}(f,\tau ,U(R_\lambda )),\textsc {SampDom}(f)\right) = \varepsilon _{f,\tau } \end{aligned}$$

and we can therefore conclude the proof using the triangle inequality.    \(\square \)

We are now ready to finish the proof of Theorem 1. From an adversary \(\mathcal {A}^{\mathcal {O}_\mathsf {Hash},\mathcal {O}_\mathsf {Sign}}(f)\), we construct an algorithm \(\mathcal {B}^{\mathcal {O}_\mathsf {Hash},\mathcal {O}_{\mathsf {Sign}'}}(f)\) which corresponds to running \(\mathcal {A}^{\mathcal {O}_\mathsf {Hash},\mathcal {O}_\mathsf {Sign}}\) but calls to \(\mathcal {O}_{\mathsf {Sign}}\) are replaced with calls to \(\mathcal {O}_{\mathsf {Sign}'}\). We also ask \(\mathcal {B}\) to emulate by himself the oracles \(\mathcal {O}_\mathsf {Hash},\mathcal {O}_\mathsf {Sign}'\). To do this, it initializes \(\mathcal {L}= \emptyset \) and runs these algorithms by himself by updating \(\mathcal {L}\) efficiently via a sorted list. Notice that this was not possible with \(\mathcal {O}_{\mathsf {Sign}}\) because it required \(\tau \) that \(\mathcal {B}\) does not have access to. Let us define \(Adv'(\cdot )\) as:

$$\begin{aligned} \quad Adv'(\mathcal {B}^{\mathcal {O}_\mathsf {Hash},\mathcal {O}_{\mathsf {Sign}'}}) \mathop {=}\limits ^{\triangle }\mathbb {P}\Big (f(x^*) = \mathsf {Hash}(m^*|| r^*) \ \wedge \ (m^*,e^*,r^*) \text { wasn't answered } \\ \text {by } \mathcal {O}_{\mathsf {Sign}'} \text { in } \mathcal {B}: (f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ), (m^*,r^*,x^*) \leftarrow \mathcal {B}^{\mathcal {O}_\mathsf {Hash},\mathcal {O}_{\mathsf {Sign}'}}(f)\Big ). \quad \end{aligned}$$

On average on f, the outputs of \(\mathcal {B}^{\mathcal {O}_\mathsf {Hash},\mathcal {O}_{\mathsf {Sign}'}}\) differ from those of \(\mathcal {A}^{\mathcal {O}_\mathsf {Hash},\mathcal {O}_\mathsf {Sign}}(f)\) only because we replaced calls to \(\mathcal {O}_{\mathsf {Sign}}\) with calls to \(\mathcal {O}_{\mathsf {Sign}'}\). There are \(q_{sign}\) such calls and using Lemma 1, we have:

$$ Adv_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}},\mathcal {O}_{\mathsf {Sign}}}) \le Adv'(\mathcal {B}^{\mathcal {O}_\mathsf {Hash},\mathcal {O}_{\mathsf {Sign}'}}) + q_{sign} \left( \varepsilon + \frac{(q_{sign} + q_{hash})}{2^{\lambda _0}}\right) $$

where we here also averaged over \((f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ).\)

When we first discussed the random oracle model, we showed how when calling an oracle \(\mathcal {O}_g\) for a random g, we could “internalize” the random function into each call of \(\mathcal {O}_{\text {RO}}\). In order to reach the quantity \(Adv_{\mathcal {F}}^{Claw(RF)}\), we have to undo this step and externalize the random function, but we want to keep the internal memory \(\mathcal {L}\) since it can also be modified by \(\mathcal {O}_{\mathsf {Sign}'}\). More precisely, for any function g, we define

figure l

When, we run \(\mathsf {Hash}\), each time a fresh x is queried - meaning \(\forall y, (x,y) \notin \mathcal {L}\) - we pick a random value y as its output. Equivalently, we can compute all those possible values y at the beginning, characterized by values g(x) for a random function g. Therefore, we have

$$\begin{aligned}&Adv'(\mathcal {B}^{\mathcal {O}_\mathsf {Hash},\mathcal {O}_{\mathsf {Sign}'}}) = \mathbb {P}\Big ( \mathsf {Hash}_{g}(m^*||r^*) = f(x^*) \ \wedge \ m^* \text { wasn't queried to } \\&\mathcal {O}_{\mathsf {Sign}'} \text { in } \mathcal {B}: g \mathop {\hookleftarrow }\limits ^{\$}\mathcal {RF}, \ (f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ), (m^*,r^*,x^*) \leftarrow \mathcal {B}^{\mathcal {O}_{\mathsf {Hash}_g},\mathcal {O}_{\mathsf {Sign}'}}(f)\Big ). \end{aligned}$$

Now, for a fixed g, let’s try to characterize \(\mathsf {Hash}_g(m||r)\) for any mr. If \(\forall y, (m||r,y) \notin \mathcal {L}\) then \(\mathsf {Hash}_g(m||r) = g(m||r)\). Otherwise, let y such that \((m||r,y) \in \mathcal {L}\) and we distinguish 2 cases:

  1. 1.

    (m||ry) was put in \(\mathcal {L}\) after a call to \(\mathsf {Hash}\), then \(\mathsf {Hash}_{g}(m||r) = g(m||r)\).

  2. 2.

    (m||ry) was put in \(\mathcal {L}\) after a call to \(\mathsf {Sign}'\), then m was queried to \(\mathcal {O}_{\mathsf {Sign}'}\).

Therefore, for any triplet \((m^*,r^*,x^*) \leftarrow \mathcal {B}^{\mathcal {O}_{\mathsf {Hash}_g},\mathcal {O}_{\mathsf {Sign}'}}\), we have:

$$\begin{aligned} \quad m^{*} \text{ is } \text{ not } \text{ queried } \text{ to } \mathcal {O}_{\mathsf {Sign}'} \text{ or } m^* \text{ is } \text{ queried } \text{ and } (x^*,r^*) \text { is not answered by } \mathcal {O}_{\mathsf {Sign}'} \\ \Leftrightarrow \mathsf {Hash}_g(m^*||r^*) =g(m^*||r^*). \quad \quad \quad \, \end{aligned}$$

From there, we have:

$$\begin{aligned} Adv'(\mathcal {B}^{\mathcal {O}_\mathsf {Hash},\mathcal {O}_{\mathsf {Sign}'}}) = \mathbb {P}\Big (g(m^*||r^*) = f(x^*) : g \leftarrow \mathfrak {F}^{\{0,1\}^{*} \times \{0,1\}^{\lambda _0}}_{R_\lambda }, \qquad \qquad \quad \\ \qquad \qquad \quad (f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ), (m^*,r^*,x^*) \leftarrow \mathcal {B}^{\mathcal {O}_{\mathsf {Hash}_g},\mathcal {O}_{\mathsf {Sign}'}}(f)\Big ). \end{aligned}$$

In order to conclude, notice that the algorithm \(\mathcal {B}^{\mathcal {O}_{\mathsf {Hash}_g},\mathcal {O}_{\mathsf {Sign}'}}\) can be seen as an algorithm \(\mathcal {C}^{\mathcal {O}_g}\) that runs in time \(\widetilde{O}(t)\) and performs \(q_{\text {hash}}\) queries to \(O_g\), so

$$\begin{aligned}Adv'(\mathcal {B}^{\mathcal {O}_\mathsf {Hash},\mathcal {O}_{\mathsf {Sign}'}})&= \mathbb {P}\Big (g(m^*||r^*) = f(x^*) : g \leftarrow \mathfrak {F}^{\{0,1\}^{*} \times \{0,1\}^{\lambda _0}}_{R_\lambda }, \\&\qquad \qquad (f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ), (m^*,r^*,x^*) \leftarrow \mathcal {C}^{\mathcal {O}_g}(f)\Big ) \\&= Adv_{\mathcal {F}}^{Claw(RF)}(\mathcal {C}^{\mathcal {O}_g}) \end{aligned}$$

Putting everything together, we get \( Adv_{\mathcal {F}}^{Claw(RF)}(\mathcal {C}^{\mathcal {O}_g}) = Adv'(\mathcal {B}^{\mathcal {O}_\mathsf {Hash},\mathcal {O}_{\mathsf {Sign}'}})\) and

$$ Adv_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}},\mathcal {O}_{\mathsf {Sign}}}) \le Adv_{\mathcal {F}}^{Claw(RF)}(\mathcal {C}^{\mathcal {O}_g}) + q_{sign} \left( \varepsilon + \frac{(q_{sign} + q_{hash})}{2^{\lambda _0}}\right) $$

which concludes the proof.    \(\square \)

7 Quantum Security Proof in the QROM

In this section, we will prove that in the quantum setting, we can also prove the security of \(\textsc {S}^{\mathcal {F}}\) for a collection \(\mathcal {F}\) of ATPSF. We first present the quantum random oracle model.

7.1 The Quantum Random Oracle Model

The Quantum Random Oracle Model (QROM) is a model where we model a certain function with a random function \(\mathcal {H}\) but since we are in the quantum setting, we have a black box access to \(\mathcal {H}\) and thus also to the unitary \(\mathcal {O}_\mathcal {H}(|x\rangle |y\rangle ) = |x\rangle |\mathcal {H}(x)+y\rangle .\) Unlike the classical setting, when calling \(\mathcal {O}_{\mathcal {H}}\) for a randomly chosen \(\mathcal {H}\), we will not be able to generate values \(\mathcal {H}(x)\) on the fly as we did classically since a quantum query potentially queries all values \(\mathcal {H}(x)\) at the same timeFootnote 5. Hopefully we will still have tools to reprogram the QROM.

When a function h is drawn uniformly from the set of functions \(\mathfrak {F}^{D}_{\{0,1\}^{m}}\), we can equivalently, for each input \(x\in D\), draw \(h(x) \mathop {\hookleftarrow }\limits ^{\$}\{0,1\}^m\), which fully specified the function h. For each distribution \(\mathcal {T}\) on \(\{0,1\}^m\), let us consider the distribution of functions \(\text {Fun}_\mathcal {T}\) where \(h \leftarrow \text {Fun}_\mathcal {T}\) means that for each x, \(h(x) \mathop {\hookleftarrow }\limits ^{\$}\mathcal {T}\). In [Zha12], Zhandry showed the following relation.

Proposition 5

Let \(\mathcal {A}^{\mathcal {O}}\) be a quantum query algorithm running in time t and making q queries to the oracle \(\mathcal {O}\). Let \(\mathcal {T}\) be a probability distribution on \(\{0,1\}^m\) such that \(\varDelta (\mathcal {T},\mathcal {U}(\{0,1\}^m)) \le \varepsilon \). We have

$$ \left| \ \mathbb {P}\left( \mathcal {A}^{\mathcal {O}_h} \text { outputs } 1 : h \leftarrow \mathfrak {F}^{D}_{\{0,1\}^m}\right) - \mathbb {P}\left( \mathcal {A}^{\mathcal {O}_g} \text { outputs } 1 : g \leftarrow \text {Fun}_\mathcal {T}\right) \ \right| \le \frac{8\pi }{\sqrt{3}}q^{3/2}\sqrt{\varepsilon }.$$

One can compare this to the classical case, which follows directly from Proposition 1.

Proposition 6

Let \(\mathcal {A}^{\mathcal {O}}\) be a classical query algorithm running in time t and making q queries to the oracle \(\mathcal {O}\). Let \(\mathcal {T}\) be a probability distribution on \(\{0,1\}^m\) such that \(\varDelta (\mathcal {T},\mathcal {U}(\{0,1\}^m)) \le \varepsilon \). We have

$$\left| \mathbb {P}\left( \mathcal {A}^{\mathcal {O}_h} \text { outputs } 1 : h \leftarrow \mathfrak {F}^{D}_{\{0,1\}^m}\right) - \mathbb {P}\left( \mathcal {A}^{\mathcal {O}_g} \text { outputs } 1 : g \leftarrow \text {Fun}_{\mathcal {T}}\right) \right| \le q\varepsilon .$$

With Proposition 5, we will be able to prove the quantum security of \(\textsc {S}^{\mathcal {F}}\).

7.2 Tight Quantum Security of \(\textsc {S}^{\mathcal {F}}\)

The goal of this section is to prove the following theorem

Theorem 2

Let \(\mathcal {F}= (\textsc {TrapGen},\textsc {SampDom},\textsc {SampPre})\) be an \(\varepsilon \)-ATPSF. Let \(\textsc {S}^{\mathcal {F}}\) be the associated Hash and Sign signature scheme. Let \(q = q_{\mathsf {Hash}} + q_{\mathsf {Sign}}\), we have

$$QADV_{\textsc {S}^{\mathcal {F}}}^{\mathrm {EUF}\hbox {-}\mathrm {CMA}}(t,q_{\text {hash}},q_{\text {sign}}) \le \frac{1}{2} \left( QADV^{Claw(RF)}(\widetilde{O}(t),q_{\text {hash}}) + \frac{8\pi }{\sqrt{6}}q^{3/2}\sqrt{\varepsilon } + q_{\text {sign}}\left( \varepsilon + \frac{q_{\text {sign}}}{2^{\lambda _0}}\right) \right) . $$

By taking \(\lambda _0 = \lambda + 2\log (q_{\text {sign}})\), this gives

$$QADV_{\textsc {S}^{\mathcal {F}}}^{\mathrm {EUF}\hbox {-}\mathrm {CMA}}(t,q_{\text {hash}},q_{\text {sign}}) \le \frac{1}{2} \left( QADV^{Claw(RF)}(\widetilde{O}(t),q_{\text {hash}}) + \frac{8\pi }{\sqrt{6}}q^{3/2}\sqrt{\varepsilon } + q_{\text {sign}}\varepsilon + \frac{1}{2^\lambda }\right) . $$

Before proving this statement, we need to add another definition. Let

\(\mathcal {F}= (\textsc {TrapGen},\textsc {SampDom},\textsc {SampPre})\) be an \(\varepsilon \)-ATPSF. We said that \(\textsc {SampDom}(f)\) was an efficient probabilistic algorithm. Here, we need to explicit this randomness and work with a deterministic algorithm. Let \(\textsc {SampDom}_{det}(f,K)\) be the algorithm which corresponds to running \(\textsc {SampDom}(f)\) with randomness \(K \in \{0,1\}^k\). What this means is that running \(\textsc {SampDom}(f)\) is done by choosing \(K \mathop {\hookleftarrow }\limits ^{\$}\{0,1\}^k\) and running \(\textsc {SampDom}_{det}(f,K).\) With this new definition, we can go and prove our theorem.

Proof

(of Theorem 2). Fix \(\mathcal {F},\textsc {S}^{\mathcal {F}}\) and let \(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}},\mathcal {O}_{\mathsf {Sign}}}\) an adversary in the quantum EUF-CMA model with \(|\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}},\mathcal {O}_{\mathsf {Sign}}}| = (t,q_{\mathsf {Hash}},q_{\mathsf {Sign}})\) running in time t. In all our discussion, we fix a pair \((f,\tau )\) and we consider the \(\mathsf {Hash}\) and \(\mathsf {Sign}\) procedures of \(\textsc {S}^{\mathcal {F}}\) for this fixed pair. We write

\(QADV_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}},\mathcal {O}_{\mathsf {Sign}}}|(f,\tau ))\) the advantage for this pair \((f,\tau )\) and we have

$$ QADV_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}},\mathcal {O}_{\mathsf {Sign}}}) = \mathbb {E}_{(f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda )} QADV_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}},\mathcal {O}_{\mathsf {Sign}}}|(f,\tau )).$$

We consider 2 quantum accessible pseudo-random functions \(\mathcal {O}_1 : \{0,1\}^{\lambda _0} \rightarrow \{0,1\}\) and \(\mathcal {O}_2 : \{0,1\}^* \times \{0,1\}^{\lambda _0} \rightarrow \{0,1\}^k\), modeled as random function in the QROM. Using these functions, we construct the following function \(\mathsf {Hash}' : \{0,1\}^{*} \times \{0,1\}^{\lambda _0} \rightarrow R_\lambda \) as follows:

figure m

First note that we can easily construct an efficient quantum circuit for \(O_{\mathsf {Hash}'}\) using \(O_{\mathsf {Hash}}\) and \(\mathcal {O}_1,\mathcal {O}_2\). Also, since \(\mathsf {Hash},\mathcal {O}_1\) and \(\mathcal {O}_2\) are random functions, \(\mathsf {Hash}'(m,r)\) follows a distribution which is at most \(\frac{\varepsilon _{f,\tau }}{2}\)-close to the uniform distribution for each mr. Indeed, \(\mathsf {Hash}'(m,r)\) follows the uniform distribution with probability \(\frac{1}{2}\) and the distribution \(\textsc {SampDom}(f)\) with probability \(\frac{1}{2}\). But these two distributions are at most at distance \(\varepsilon _{f,\tau }\) from Proposition 2. This means that

$$ \forall (m,r), \ \varDelta (\mathsf {Hash}(m,r),\mathsf {Hash}'(m,r)) \le \frac{\varepsilon _{f,\tau }}{2}.$$

We also call \(\mathsf {Sign}'\) the procedure \(\mathsf {Sign}\) where we replaced \(\mathsf {Hash}\) with \(\mathsf {Hash}'\). From the above, we have

$$ \forall m, \ \varDelta (\mathsf {Sign}(m),\mathsf {Sign}'(m)) \le \frac{\varepsilon _{f,\tau }}{2}.$$

Using Proposition 5, we get

$$\begin{aligned} QADV_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}},\mathcal {O}_{\mathsf {Sign}}}|(f,\tau ))&\le QADV_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}'},\mathcal {O}_{\mathsf {Sign}'}}|(f,\tau )) + \frac{8\pi }{\sqrt{6}}q^{3/2}\sqrt{\varepsilon _{f,\tau }}. \end{aligned}$$

We now change \(\mathsf {Sign}'\) into \(\mathsf {Sign}''\) that doesn’t use the trapdoor and can be emulated with only the public key.

figure n

When calling \(\mathsf {Sign}''(m)\), the r chosen part of the output is a random value in \(\{0,1\}^{\lambda _0}\) such that \(\mathcal {O}_1(m,r) = 1.\) The probability that this r wasn’t returned by a previous \(\mathsf {Sign}''\) query is therefore at least \(1 - \frac{q_{\mathsf {Sign}}}{2^{(\lambda _0 - 1)}}.\) When this is the case, the distance between a call to \(\mathsf {Sign}'\) and \(\mathsf {Sign}''\) is equal to \(\varepsilon _{f,\tau }\), since K is uniformly random (using Proposition 2). Therefore, we have for each m

$$ \varDelta (\mathsf {Sign}'(m),\mathsf {Sign}''(m)) \le \frac{2q_{\mathsf {Sign}}}{2^\lambda _0} + \varepsilon _{f,\tau }. $$

Using Proposition 5, we get

$$ QADV_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}'},\mathcal {O}_{\mathsf {Sign}'}}|(f,\tau )) \le QADV_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}'},\mathcal {O}_{\mathsf {Sign}''}}|(f,\tau ))\,+\,q_{\text {sign}}(\frac{2q_{\mathsf {Sign}}}{2^\lambda _0}\,+\,\varepsilon _{f,\tau }). $$

Putting everything together, we get

$$\begin{aligned}&QADV_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}},\mathcal {O}_{\mathsf {Sign}}}|(f,\tau ))&\le QADV_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}'},\mathcal {O}_{\mathsf {Sign}''}}|(f,\tau )) + \frac{8\pi }{\sqrt{6}}q^{3/2}\sqrt{\varepsilon _{f,\tau }}\\&\quad + q_{\text {sign}}(\frac{2q_{\mathsf {Sign}}}{2^\lambda _0} + \varepsilon _{f,\tau }), \end{aligned}$$

and by taking the expectation over \((f,\tau ) \leftarrow \textsc {TrapGen}(1^\lambda )\), we get

$$\begin{aligned} QADV_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}},\mathcal {O}_{\mathsf {Sign}}})&\le QADV_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}'},\mathcal {O}_{\mathsf {Sign}''}}) + \frac{8\pi }{\sqrt{6}}q^{3/2}\sqrt{\varepsilon } + q_{\text {sign}}(\frac{2q_{\mathsf {Sign}}}{2^\lambda _0} + \varepsilon ). \end{aligned}$$

where we used the concavity of the root function and Jensen’s inequality. In order to conclude, let’s write

$$\begin{aligned} QADV_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}'},\mathcal {O}_{\mathsf {Sign}''}})&= \mathbb {P}\Big (\mathsf {Hash}'(m^*,r^*) = f(x^*) \textsc { and }\,m^* \text { has not been queried in } \mathcal {O}_{\mathsf {Sign}''} : \\&\qquad \qquad (f,\tau ) \leftarrow \textsc {TrapGen}, \ (m^*,x^*,r^*) \leftarrow \mathcal {A}^{\mathcal {O}_{\mathsf {Hash}'},\mathcal {O}_{\mathsf {Sign}''}}(f) \Big ) \\&= \frac{1}{2} \mathbb {P}\Big (\mathsf {Hash}(m^*,r^*) = f(x^*) : \\&\qquad \qquad (f,\tau ) \leftarrow \textsc {TrapGen}, \ (m^*,x^*,r^*) \leftarrow \mathcal {A}^{\mathcal {O}_{\mathsf {Hash}'},\mathcal {O}_{\mathsf {Sign}''}}(f) \Big ) \\&= \frac{1}{2} QADV^{Claw(RF)}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}'},\mathcal {O}_{\mathsf {Sign}''}}). \end{aligned}$$

Here, we used the fact that if \(m^*\) is not queried in \(\mathsf {Sign}''\), the value of \(\mathcal {O}_1(m^*,r^*)\) is random and is equal to 0 with probability \(\frac{1}{2}.\) When this occurs, we have \(\mathsf {Hash}'(m,r) = \mathsf {Hash}(m,r).\) Finally, notice that \(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}'},\mathcal {O}_\mathsf {Sign}''}\) can be performed locally with the public key and oracle calls to \(\mathcal {O}_{\mathsf {Hash}}\) so we can write \(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}'},\mathcal {O}_\mathsf {Sign}''} = \mathcal {B}^{\mathcal {O}_{\mathsf {Hash}}}\) for some algorithm \(\mathcal {B}\). We have therefore

$$QADV_{\textsc {S}^{\mathcal {F}}}^{\text {EUF-CMA}}(\mathcal {A}^{\mathcal {O}_{\mathsf {Hash}},\mathcal {O}_{\mathsf {Sign}}}) \le \frac{1}{2} \left( QADV^{Claw(RF)}(\mathcal {B}^{\mathcal {O}_{\mathsf {Hash}}}) + \frac{8\pi }{\sqrt{6}}q^{3/2}\sqrt{\varepsilon } + q_{\text {sign}}\left( \varepsilon + \frac{q_{\text {sign}}}{2^{\lambda _0}}\right) \right) . $$

Notice finally that \(\mathcal {B}\) makes as much queries to \(\mathsf {Hash}\) as \(\mathcal {A}\) and runs in essentially the same time (\(\widetilde{O}(t)\)), which concludes the proof.    \(\square \)

8 Applying the Result to Code-Based Signatures Based on ATPSF

In this Section, we present a general analysis of code-based signatures based on ATPSF families. We will show that using the tightness to the Claw(RF) problem gives better results than using the standard inversion or collision problem. This section is motivated by the WAVE signature scheme [DST19a], that constructs a code-based ATPSF family, but is relevant for any such construction.

8.1 Canonical Construction of Code-Based ATPSF

We present here the definition of a canonical code-based ATPSF, adapting Definition 2, that we call CBATPSF.

Here the notation \(|\cdot |\) denotes the Hamming weight, i.e: the number of non zero component of a vector. Furthermore, vectors will be written with bold letters (such as \({\mathbf {e}}\)) and uppercase bold letters are used to denote matrices (such as \({\mathbf {H}}\)). Vectors will be in row notation.

Definition 5

An \(\varepsilon \)-CBATPSF (for Code Based Average Trapdoor Preimage Sampleable Functions (or Function Family)) is an efficient triplet of probabilistic algorithms (\(\textsc {TrapGen}\), \(\textsc {SampDom}\), \(\textsc {SampPre}\)) with parameters nkqw (that can depend on the security parameter \(\lambda \)) where:

  • \(\textsc {TrapGen}(1^\lambda ) \rightarrow ({\mathbf {H}},\tau )\). Takes the security parameter \(\lambda \) and outputs \({\mathbf {H}}\in \mathbb {F}_q^{(n-k)\times n}\), and a trapdoor \(\tau \). We also define \(D_\lambda = \{{\mathbf {e}}\in \mathbb {F}_q^n : |{\mathbf {e}}| = w\}\) and \(R_\lambda = \mathbb {F}_q^{n-k}\). The trapdoor function maps then any \({\mathbf {e}}\in D_\lambda \) to \({\mathbf {e}}{\mathbf {H}}\in R_\lambda \).

  • \(\textsc {SampDom}({\mathbf {H}}) \rightarrow {\mathbf {e}}\). Takes a matrix \({\mathbf {H}}\in \mathbb {F}_q^{(n-k)\times n}\) and outputs a vector \({\mathbf {e}}\in D_{\lambda }\).

  • \(\textsc {SampPre}({\mathbf {H}},\tau ,{\mathbf {s}}) \rightarrow {\mathbf {e}}\). Takes a matrix \({\mathbf {H}}\in \mathbb {F}_q^{(n-k)\times n}\) with associated trapdoor \(\tau \), an element \({\mathbf {s}}\in R_\lambda \) and outputs \({\mathbf {e}}\in D_\lambda \) s.t. \({\mathbf {e}}{\mathbf {H}}= {\mathbf {s}}\).

For this definition, the one-wayness, collision and Claw(RF) problems become the following, for a fixed algorithm \(\mathcal {A}\) that outputs elements \({\mathbf {e}}\) in \(D_\lambda \), meaning that \({\mathbf {e}}\in \mathbb {F}_q^{n}\) and \(|{\mathbf {e}}| = w\):

$$\begin{aligned}Adv_{\mathcal {F}}^{OW}(\mathcal {A})&= \mathbb {P}\left( {\mathbf {e}}{{\mathbf {H}}}^{ {\intercal } } = {\mathbf {s}}: ({\mathbf {H}},\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ) , \ {\mathbf {s}}\mathop {\hookleftarrow }\limits ^{\$}\mathbb {F}_2^{n-k}, \ {\mathbf {e}}\leftarrow \mathcal {A}({\mathbf {H}},s) \right) , \\ Adv_{\mathcal {F}}^{Coll}(\mathcal {A})&= \mathbb {P}\left( {\mathbf {e}}_1{{\mathbf {H}}}^{ {\intercal } } = {\mathbf {e}}_2{{\mathbf {H}}}^{ {\intercal } } \wedge {\mathbf {e}}_1 \ne {\mathbf {e}}_2 : ({\mathbf {H}},\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ), \ ({\mathbf {e}}_1,{\mathbf {e}}_2) \leftarrow \mathcal {A}({\mathbf {H}}) \right) , \\ Adv_{\mathcal {F}}^{Claw(RF)}(\mathcal {A}^{\mathcal {O}})&=\! \mathbb {P}\left( {\mathbf {e}}_1{{\mathbf {H}}}^{ {\intercal } }\! =\! {\mathbf {e}}_2{{\mathbf {H}}}^{ {\intercal } } : h \mathop {\hookleftarrow }\limits ^{\$}\mathfrak {F}^{D_\lambda }_{R_\lambda }, \ ({\mathbf {H}},\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ) , \ ({\mathbf {e}}_1,{\mathbf {e}}_2) \leftarrow \mathcal {A}^{\mathcal {O}_h}({\mathbf {H}})\right) . \end{aligned}$$

These problems are directly related to standard problems used in code-based cryptography as we will see in the next section. These problem are believed to be hard when the matrix \({\mathbf {H}}\) is chosen uniformly at random from the set of full rank matrices. However, in the CBATPSF construction, this matrix is generated from \(\textsc {TrapGen}\) so it needn’t be uniform. We therefore have to argue that these problems remains hard for matrices \({\mathbf {H}}\) generated from \(\textsc {TrapGen}\). A way to do this is to argue that these matrices are computationally indistinguishable from uniformly random matrices of full rank.

Let \({\text {FR}}^{k,n}_q \mathop {=}\limits ^{\triangle }\{{\mathbf {H}}\in \mathbb {F}_q^{k\times n} : {\mathbf {H}}\text { has full rank}\}\). We define the advantage \(Adv_{\mathcal {F}}^{TvsU}(\mathcal {A})\) of distinguishing matrices generated from \(\textsc {TrapGen}(1^\lambda )\) with uniformly chosen matrices of full rank, for any algorithm \(\mathcal {A}\):

$$ Adv_{\mathcal {F}}^{TvsU}(\mathcal {A}) \mathop {=}\limits ^{\triangle }\left| \mathbb {P}\left( \mathcal {A}({\mathbf {H}}) \text { outputs } 1 : ({\mathbf {H}},\tau ) \leftarrow \textsc {TrapGen}(1^\lambda ) \right) - \mathbb {P}\left( \mathcal {A}({\mathbf {H}}) \text { outputs } 1 : {\mathbf {H}}\mathop {\hookleftarrow }\limits ^{\$}{\text {FR}}^{k,n}_q \right) \right| .$$

We also define, for any time t: \(Adv_{\mathcal {F}}^{TvsU}(t) = \max _{\AA : |\mathcal {A}| = t} Adv_{\mathcal {F}}^{TvsU}(\mathcal {A}).\)

8.2 Relating Hardness of Breaking the CBATPSF with the Hardness of Breaking Standard Code-Based Problems

In this section, we will relate the different advantages (for one-wayness, collision and Claw(RF)) with known problems in code-based cryptography. This will show that using our tight reduction to the Claw(RF) will give better results than using one-wayness or collision finding.

One-Wayness vs. Syndrome Decoding. The syndrome decoding problem is the most studied problem in code-based cryptography.

Problem 2

(Syndrome Decoding - SD(nqkw)).

  • Instance: a parity-check matrix \({\mathbf {H}}\in \mathbb {F}_q^{(n-k)\times n}\) of rank \(n-k\), a syndrome \({\mathbf {s}}\in \mathbb {F}_q^{n-k}\),

  • Output: \({\mathbf {e}}\in \mathbb {F}_q^{n}\) of Hamming weight w such that \({\mathbf {e}}{{\mathbf {H}}}^{ {\intercal } } = {\mathbf {s}}\)

This problem is believed to be hard when the matrix \({\mathbf {H}}\) is chosen randomly from\( {\text {FR}}^{k,n}_q\) and the syndrome \({\mathbf {s}}\) is chosen at random. This setting has been extensively studied in [Pra62, Ste88, Dum91, Bar97, FS09, MMT11, BJMM12, CS15, MO15, DT17, BM18, BCDL19]. It is also known to be NP-complete in the worst case [BMvT78] and there is a search to decision reduction (see for instance [FS96]). For any algorithm \(\mathcal {A}\), we define the (average case) syndrome decoding advantage as

Definition 6

(SD-advantage(nqkw)). For any algorithm \(\mathcal {A}\), we define

$$ Adv_{(n,q,k,w)}^{\text {SD}}(\mathcal {A}) \mathop {=}\limits ^{\triangle }\mathbb {P}\left( {\mathbf {e}}{{\mathbf {H}}}^{ {\intercal } } = {\mathbf {s}} \text{ and } |{\mathbf {e}}| = w : {\mathbf {H}}\mathop {\hookleftarrow }\limits ^{\$}{\text {FR}}^{k,n}_q, \ {\mathbf {s}}\mathop {\hookleftarrow }\limits ^{\$}\mathbb {F}_q^{n-k}, \ {\mathbf {e}}\leftarrow \mathcal {A}({\mathbf {H}},{\mathbf {s}}) \right) , $$

and for any time t, we also define, \( Adv^{\text {SD}}_{(n,q,k,w)}(t) \mathop {=}\limits ^{\triangle }\max _{\AA : |\mathcal {A}| = t} Adv^{\text {SD}}_{(n,q,k,w)}(\mathcal {A}). \)

Notice that this is exactly the one-wayness advantage of the CBATPSF , except that \({\mathbf {H}}\) is chosen uniformly and not from \(\textsc {TrapGen}\). Therefore, we immediately have for any t

$$\begin{aligned} Adv_{\mathcal {F}}^{OW}(t) \le Adv^{\text {SD}}_{(n,q,R,\omega )}(t) + Adv_{\mathcal {F}}^{TvsU}(t). \end{aligned}$$
(10)

Now consider the signature scheme \(\textsc {S}^{\mathcal {F}}\) based on a CBATPSF \(\mathcal {F}\) as defined in Definition 5. By combining Theorem 1, Proposition 4, and Eq. (10), we immediately get the following proposition.

Proposition 7

Let \(\mathcal {F}\) be an \(\varepsilon \)-CBATPSF as defined in Definition 5 and let \(\textsc {S}^{\mathcal {F}}\) be the corresponding signature scheme. We have:

$$ Adv_{\textsc {S}^{\mathcal {F}}}^{\mathrm {EUF}\hbox {-}\mathrm {CMA}}(t,q_{\text {hash}},q_{\text {sign}}) \le q_{\text {hash}}\left( Adv^{\text {SD}}_{(n,q,k,w)}(\widetilde{O}(t)) + Adv_{\mathcal {F}}^{TvsU}(t)\right) + q_{\text {sign}}\left( \varepsilon + \frac{(q_{sign} + q_{hash})}{2^{\lambda _0}}\right) . $$

Notice here that there is a \(q_{\text {hash}}\) factor in front of \(Adv^{\text {SD}}_{(n,q,k,w)}(\widetilde{O}(t))\) which implies a significant loss in the security reduction.

Claw(RF) vs. DOOM Problem. Decode One Out of Many is a multitarget generalization of the syndrome decoding problem. Instead of giving one syndrome \({\mathbf {s}}\) as input, we give several random syndromes and we have to solve the syndrome decoding problem for one of these syndromes. The way we model having access to these syndromes is by giving a black box access to a random function h that outputs these syndromes.

Problem 3

(Decoding One Out of Many - DOOM(nqkw)).

  • Instance: a parity-check matrix \({\mathbf {H}}\in \mathbb {F}_q^{(n-k)\times n}\) of rank \(n-k\) and a function \(h : D \rightarrow \mathbb {F}_q^n\) for some domain D, to which we only have a black box access.

  • Output: \({\mathbf {e}}\in \mathbb {F}_q^{n}\) of Hamming weight w and \(i \in D\) such that \({\mathbf {e}}{{\mathbf {H}}}^{ {\intercal } } = h(i)\)

This is problem is sometimes defined when we get as an input N random syndromes \({\mathbf {s}}_1,\dots ,{\mathbf {s}}_N\). This can be reformulated in our problem by first computing h(i) for \(i \in \{1,\dots ,N\}\) and setting \({\mathbf {s}}_i = h(i)\). This problem has been studied in quite a few papers [JJ02, Sen11, DST17, BCDL19]. It is easier than the syndrome decoding problem and sometimes quite significantly. We define the DOOM advantage as follows, for any query algorithm \(\mathcal {A}^\mathcal {O}\).

Definition 7

(DOOM-advantage(nqkw)). For any algorithm \(\mathcal {A}\), we define

$$\begin{aligned} Adv_{(n,q,k,w)}^{\text {DOOM}}(\mathcal {A}^\mathcal {O}) \mathop {=}\limits ^{\triangle }\mathbb {P}\Big ({\mathbf {e}}{{\mathbf {H}}}^{ {\intercal } } = h(i) \text{ and } |{\mathbf {e}}| = w : {\mathbf {H}}\mathop {\hookleftarrow }\limits ^{\$}{\text {FR}}^{k,n}_q, h \mathop {\hookleftarrow }\limits ^{\$}\mathfrak {F}^{D_\lambda }_{R_\lambda }, \ ({\mathbf {e}},i) \leftarrow \mathcal {A}^{\mathcal {O}_h}({\mathbf {H}}) \Big ), \end{aligned}$$

and for any time t, we also define \( Adv^{\text {DOOM}}_{(n,q,k,w)}(t,q) \mathop {=}\limits ^{\triangle }\max _{\mathcal {A}^\mathcal {O}: |\mathcal {A}^\mathcal {O}| = (t,q)} Adv^{\text {DOOM}}_{(n,q,k,w)}(\mathcal {A}^\mathcal {O}). \)

Again, this is exactly the Claw(RF) advantage up to the distribution of the input matrix \({\mathbf {H}}\). This means we have

$$\begin{aligned} Adv_{\mathcal {F}}^{Claw(RF)}(t,q) \le Adv^{\text {DOOM}}_{(n,q,k,w)}(t,q) + Adv_{\mathcal {F}}^{TvsU}(t). \end{aligned}$$
(11)

By combining Theorem 1 and Eq. (11), we immediately get:

Proposition 8

Let \(\mathcal {F}\) be an \(\varepsilon \)-CBATPSF as defined in Definition 5 and let \(\textsc {S}^{\mathcal {F}}\) be the corresponding signature scheme. We have

$$ Adv_{\textsc {S}^{\mathcal {F}}}^{\mathrm {EUF}\hbox {-}\mathrm {CMA}}(t,q_{\text {hash}},q_{\text {sign}})\! \le \! Adv^{\text {DOOM}}_{(n,q,k,w)}(t) + Adv_{\mathcal {F}}^{TvsU}(t) + q_{\text {sign}}\left( \!\varepsilon + \frac{(q_{sign} + q_{hash})}{2^{\lambda _0}}\right) . $$

Here, we have the reduction to the DOOM problem. Even though it is simpler than syndrome decoding, the reduction is tight so it will overall give much better results.

Using the Collision Problem. One could also, as in [GPV08], replace the Claw(RF) problem with the collision problem. However, in the case of codes, this problem is much simpler. Actually, there is a large range of parameters for which the problem can be solved in polynomial time while the syndrome decoding or DOOM problems are solved in exponential time.

8.3 Wave Instantiation

In this context authors, of [DST19a] constructed a signature scheme called Wave, based on an CBATPSF family. As far as we know, this is the first post-quantum signature scheme based on this paradigm that doesn’t use lattice based assumptions. To accomplish this they introduced a family of codes which forms their trapdoor, namely the permuted generalized \((U,U\,+\,V)\)-codes. The presentation of this trapdoor is out of the scope of this paper. Wave constructs an \(\varepsilon \)-CBATPSF family with the following parameters:

$$ n = 66.34 \lambda , \ w = 0.9396n, \ q = 3, \ k = 0.66n. $$

With this choice of parameters, \(\varepsilon \) and \(\ Adv_\mathcal {F}^{TvsU}(t)\) are small enough to have 128 bits of security. The interested reader can read the long version of Wave [DST19b], in particular [DST19b, Theorem 3, p39] and [DST19b, Proposition 14, page 31] to have more details. Notice also that the public key has \(\log _2(3)n^2\) bits.

Here, we are in a parameter range where the collision problem is simple, so we can’t use the original tight GPV bound. However, as studied in [BCDL19], the best classical algorithms for DOOM have essentially the same complexity than those for Syndrome Decoding. This means that in order to derive the security of Wave, if Proposition 7 was used instead of Proposition 8, we would have to double the value of n and hence increase the public key size (already quite large) by a factor of 4. This shows on a concrete example the importance of our results.

9 Conclusion

In this paper, we extended the GPV construction of signature schemes by allowing the use of ATPSF instead of TPSF. We also presented a security reduction of these signature schemes to the Claw(RF) which is not only tight but also optimal, meaning that an algorithm that solves the Claw(RF) also breaks the underlying signature scheme.

Our results allow to extend the GPV construction to non-lattice based schemes. In particular, for code-based cryptography, it is often easy to find collisions for the underlying trapdoor function. What we showed is that with this construction, we cannot have a tight reduction to Syndrome Decoding so we cannot ignore the non-tightness to \(\text {SD}\). On the other side, losing a q factor (the number of queries) in the security reduction is greatly overkill. The good approach is to consider the Claw(RF) problem which in the code-based setting is the DOOM problem. Because of our optimality results, the Claw(RF) should always be the studied problem in GPV-like construction in order to correctly assess the security of the signature scheme.

More generally, we advocate that all Hash and Sign signatures should follow similar guidelines. This was implicitly done for lattices because SIS and ISIS are considered of same difficulty and the associated Claw(RF) problem lies between them.