1 Introduction

The question of minimizing the parallel time complexity of cryptographic primitives has been the subject of an extensive body of research. At the extreme, one would aim for an ultimate level of efficiency at the form of constant-parallel time implementation. Namely, the goal is to have “local” cryptographic constructions in which each bit of the output depends only on a small constant number of input bits, and each bit of the input influences only a constant number of outputs. Achieving both constant input locality and constant output locality allows an implementation by constant-depth circuit of bounded fan-in and bounded fan-out [8]. Furthermore, such local constructions have turned to be surprisingly helpful in speeding up the sequential complexity of cryptography [19]. At a more abstract level, the study of locally computable cryptography allows us to understand whether extremely simple functions can generate cryptographic hardness.

Intuitively, one may suspect that functions with local input-output dependencies may be vulnerable to algorithmic attacks. Still, during the last decade it was shown that, under standard intractability assumptions, many cryptographic tasks can be implemented by local functions [68]. This includes basic primitives such as one-way functions and pseudorandom generators, as well as, more complicated primitives such as public-key encryption schemes. One notable exception, for which such a result is unknown, is hash functions with linear shrinkage.

A collection of hash functions \(\mathcal {H}=\left\{ h:\{0,1\}^n\rightarrow \{0,1\}^{m}\right\} \) shrinks a long n-bit string into a shorter string of length \(m<n\) such that, given a random function \(h\mathop {\leftarrow }\limits ^{R}\mathcal {H}\) and a target string x, it is hard to find a sibling \(y\ne x\) that collide with x under h. The exact specification of the above game corresponds to different notions of hashing. We will mainly consider universal one-way hash functions (UOWHFs) [23], in which the adversary specifies the target string x without seeing the function h. (This property is also known as target collision resistance [9], TCR in short.) A central parameter of a hash function is the amount of shrinkage it provides. We measure this as the difference between the output length m and the input length n, namely the additive shrinkage \(n-m\). We say that the shrinkage is linear if \(n-m=\Omega (n)\), i.e., \(m<(1-\varepsilon )n\) for some constant \(\varepsilon \). In this paper we ask:

Are there UOWHFs with linear shrinkage and constant output and/or input locality ?

Previous Results. In [7] it is shown that any log-space computable UOWHF can be converted into a UOWHF with constant output locality and sublinear shrinkage of \(n-m=n^{\varepsilon }\), for a constant \(\varepsilon <1\). (A similar result holds for collision-resistant hash functions.) This gives rise to UOWHFs with constant output locality based on standard cryptographic assumptions (e.g., factoring), or, more generally, on any log-space computable one-way function [17, 23, 26]. Although there are several ways to amplify the shrinkage of a UOWHF (cf. [9, 23]), none of these transformations preserve low locality, and so the question of obtaining UOWHFs with linear shrinkage and constant output locality has remained wide open.

The situation is even worse for constant input locality. In [8] it was shown that tasks which involve secrecy (e.g., one-wayness, pseudorandomness, symmetric or public-key encryption) can be implemented with constant input locality (under plausible assumptions), while tasks which require some form of non-malleability (e.g., MACs, signatures, non-malleable encryption) cannot be implemented with constant input locality. Interestingly, hash functions escaped this characterization. Although it is easy to find near-collisions in a function with constant input locality (simply flip the first bit of the target x), it is unknown how to extend this to a full collision. Overall, the question of computing UOWHFs with constant input locality has remained open, even for the case of a single-bit shrinkage \(n-m=1\).Footnote 1 Put differently, high input locality (as captured by the so-called confusion/diffusion or avalanche principle) is typically viewed as a desired property for collision resistance – but is it really necessary?

1.1 Main Result

We construct the first locally computable UOWHF with linear shrinkage. Our construction has both constant input locality and constant output locality, and is based on the one-wayness of random local functions (also known as Goldreich’s one-way function [16]). The latter assumption asserts that a random local function \(f:\{0,1\}^n\rightarrow \{0,1\}^{m}\) is one-way where f is chosen uniformly at random as follows. View the n inputs and m outputs as nodes in a bipartite graph G and connect each output node \(y_i\) to a random set of d distinct input nodes. To compute the ith output apply some fixed d-ary predicate \(P:\{0,1\}^{d}\rightarrow \{0,1\}\) to the d inputs that are connected to \(y_i\). This experiment defines a distribution \(\mathcal {F}_{P,n,m}\) over functions with output locality of d. (See Sect. 2 for a formal definition.) We prove the following theorem.

Theorem 1.1

(Main theorem) There exists a constant d and a predicate \(P:\{0,1\}^d\rightarrow \{0,1\}\) for which the following holds. If the collection \(\mathcal {F}_{P,n,m=\Omega (n^3)}\) is one-way, then there exists a collection \(\mathcal {H}\) of UOWHF with linear shrinkage, constant input locality and constant output locality.

The theorem is constructive and can be applied to every predicate which satisfies a simple condition. In particular, we show that the predicate \(\mathsf {MST}_{d_1,d_2}(x,y)=(y_{1}\oplus \ldots \oplus y_{d_1}) \oplus (x_1 \wedge \ldots \wedge x_{d_2})\), defined by [22], satisfies the condition for every \(d_2\ge 2\) and every sufficiently large odd constant \(d_1\). The hypothesis of the theorem (one-wayness of random local functions) was extensively studied in the last few years and it is supported both experimentally [13, 24] and theoretically [2, 11, 13, 14, 16, 20, 21]. In fact, recent evidence suggest that, for a proper predicate, this collection may even be pseudorandom [4, 5]. Interestingly, Theorem 1.1 can be proved under the (possibly weaker) assumption that \(\mathcal {F}_{P,n,m=\Omega (n)}\) is a weak pseudorandom generator (i.e., its output cannot be distinguished from truly random string with advantage better than, say, 0.1).

There are several interesting corollaries that follow from Theorem 1.1. First, it is possible to reduce the output locality to 3 (which is optimal) while preserving (tiny) linear shrinkage (i.e., \(m=(1-\varepsilon )n\) for some small \(\varepsilon \)) via the compiler of [7].Footnote 2 Second, by self-composing \(\mathcal {H}\) a constant number times, one can get arbitrary linear shrinkage (i.e., \(m=\varepsilon n\) for arbitrary constant \(\varepsilon >0\)) at the expense of increasing the locality to a larger constant. Furthermore, by iterating \(\mathcal {H}\) a logarithmic number of times we get a linear time computable hash function \(\mathcal {H}'\) with polynomial shrinkage factor of \(m=n^{\varepsilon }\) (the ith level of the circuit contains \(O(n/2^{i})\) gates). As observed by [19], one can then employ the Naor–Yung transform [23] and sign n-bit messages with linear time complexity and only additive cryptographic overhead, i.e., \(O(n+\kappa )\). (See Sect. 6 for details.) This is contrasted with standard signature schemes whose complexity grows multiplicatively with the security parameter, i.e., \(O(n\kappa )\). Previously, such linear time computable UOWHFs and signatures were only known to exist assuming that Goldreich’s collection is exponentially-hard to invert [19].Footnote 3

1.2 Techniques

Hashing via Random Local Functions? As a starting point, we ask whether the collection \(\mathcal {F}_{P,n,m=n(1-\varepsilon )}\) itself can be used, even heuristically, as a UOWHF. To make the question non-trivial, let us assume that the distribution of the input-output dependency graph is slightly modified such that the graph is (cd)-regular, i.e., each input affects c outputs and each output depends on d inputs. (Otherwise, we are likely to have some inputs of degree 0, with no influence at all.) For concreteness let us think of P as the majority predicate. A moment of reflection suggests that collisions are easy to find even with respect to a random target string x. Indeed, suppose that there exists an input variable \(x_i\) that all of its neighboring inputs (i.e., the inputs that share an output with \(x_i\)) turn to be zero. In this case, we can flip the insensitive input \(x_i\) without affecting the output of the function, and this way obtain a trivial collision. Observe that each input variable has a constant probability of being insensitive as it has at most \(cd=O(1)\) neighbors. Overall, one is likely to find \(\Omega (n)\) insensitive inputs. Furthermore, by collecting an independent set I of insensitive inputs (that do not share any common output) one can simultaneously flip any subset of the inputs in I without changing the output. Hence, we find exponentially many collisions \(x'\) which form a “ball” around x of diameter \(\Omega (n)\). It is not hard to show that a similar attack can be applied to \(\mathcal {F}_{P,n,m}\) for every predicate P except for XOR or its negation. (Unfortunately, in the latter case collisions can be found via Gaussian elimination.)

Despite this failure let us keep asking: Can \(\mathcal {F}_{P,n,m}\) achieve some, possibly weak, form of collision resistance? Specifically, one may hope to show that it is hard to find collisions which are \(\beta \)-far from the target x, for some (non-trivial) constant \(\beta \). This assumption is intuitively supported by study of the geometry of the solutions of random Constraint Satisfaction Problems (e.g., Random SAT) [1]. Thinking of each output as inducing a local constraint on the inputs, it can be essentially showed that, for under-constraint problems where \(m<n\), the space of solutions (siblings of x) is shattered into far apart clusters of Hamming close solutions. It is believed that efficient algorithms cannot move from one cluster to another as such a transition requires to pass through solutions \(x'\) which violate many constraints (i.e., \(f(x')\) is far, in Hamming distance, from f(x)). Therefore, it seems plausible to conjecture that the collection \(\mathcal {F}_{P,n,m}\) is secure with respect to \(\beta \)-far collisions.

As our main technical contribution, we prove that a weak form of this conjecture holds assuming the pseudorandomness of \(\mathcal {F}_{P,n,m'}\) (where \(m'> n > m\)). Specifically, we prove the following theorem. (See Sect. 4 for details).

Theorem 1.2

There exists a predicate P, constants \(\varepsilon ,\beta \in (0,\frac{1}{2})\) and \(c>1\) such that for every \(\delta >0\), if \(\mathcal {F}_{P,n,cn}\) is \((\delta /3)\)-pseudorandom, then it is hard to find \(\beta \)-far target collisions in \(\mathcal {F}_{P,n,(1-\varepsilon )n}\) with probability better than \(\delta \).

We mention that we can base the theorem on the one-wayness of random local functions using the reduction of [4].

Proof idea

Let \(m=(1-\varepsilon )n\). Let P be a balanced predicate, which, in addition, enjoys the following sensitivity properties:Footnote 4

$$\begin{aligned} \forall x,x'\in \{0,1\}^n, \Delta (x,x')>\beta&\Rightarrow \mathop {{{\mathrm{\mathbb {E}}}}}\limits _{f\mathop {\leftarrow }\limits ^{R}\mathcal {F}_{P,n,m}}[\Delta (f(x),f(x'))]>\gamma \qquad \text {for some constants } \beta ,\gamma >0 \\ \forall x,x'\in \{0,1\}^n, \Delta (x,x')=\frac{1}{2}&\Rightarrow \mathop {{{\mathrm{\mathbb {E}}}}}\limits _{f\mathop {\leftarrow }\limits ^{R}\mathcal {F}_{P,n,m}}[\Delta (f(x),f(x'))]=\frac{1}{2}, \end{aligned}$$

where \(\Delta (\cdot ,\cdot )\) denotes the relative Hamming distance and \({{\mathrm{\mathbb {E}}}}\) denotes expectation. An example of such a predicate is parity \(\oplus _d\) with an odd arity d. A relaxation of the above properties (e.g., by considering only x of Hamming weight \(\frac{1}{2}\)) allows us to use richer families of predicates including \(\mathsf {MST}_{d_1,d_2}\) for every \(d_2\ge 2\) and every odd constant \(d_1\). (Larger \(d_1\) pushes \(\beta \) toward zero and increases \(\gamma \) toward \(\frac{1}{2}\).)

Assume that we have an algorithm \(\mathcal {A}\) that, given a random function \(h\mathop {\leftarrow }\limits ^{R}\mathcal {F}_{P,n,m}\) and a random target w, finds a \(\beta \)-far sibling with probability \(\delta \). Let us first try to use \(\mathcal {A}\) to invert the collection \(\mathcal {F}_{P,n,m'}\) with output length of \(m' \approx 2m\). Given a random function \(f_G\mathop {\leftarrow }\limits ^{R}\mathcal {F}_{P,n,m'}\) specified by a random input-output dependencies graph G, and an image \(y=f_G(x)\) of a random point \(x\mathop {\leftarrow }\limits ^{R}\{0,1\}^n\), we will recover the preimage x as follows.

First, we choose a target w uniformly at random and partition the graph G into two graphs: \(G_=\) which contains only the output nodes for which \(f_G(w)\) agrees with y, and \(G_{\ne }\) which contains the remaining output nodes. Hence,

$$\begin{aligned} f_{G_{=}}(x)=f_{G_{=}}(w) \qquad \text {and} \qquad f_{G_{\ne }}(x)= \overline{f_{G_{\ne }}(w)}, \end{aligned}$$

where \(\overline{z}\) denotes the bit-wise complement of the string z. Since P is balanced, each subgraph contains roughly \(m'\) outputs. Next, we ask \(\mathcal {A}\) for a \(\beta \)-far sibling \(w'\) of w under the function \(f_{G_=}\). As we will see next \(w'\) is likely to be correlated with the preimage x, in the sense that for some constant \(\alpha >0\), either \(w'\) or its complement \(\overline{w'}\) agree with x on \((\frac{1}{2}+\alpha )\)-fraction of their coordinates. At this point, we will employ a result of [10] that allows us to fully recover x given such a correlated string \(w'\) (and additional O(n) outputs).

It remains to show that \(w'\) is likely to be correlated with the preimage x. Using the sensitivity properties of the predicate P, this boils down to proving that \(f_G(w')\) and \(f_G(x)\) agree on \(\frac{1}{2}+\alpha '\) of their coordinates, for some constant \(\alpha '>0\). Let us first (optimistically) assume that \(w'\) is statistically independent of the subgraph \(G_{\ne }\) that was not submitted to the adversary. That is, imagine that this part of the dependencies graph is chosen uniformly at random after \(w'\) is obtained. Since w is \(\beta \)-far from \(w'\), this pair is expected to disagree on a constant fraction \(\gamma \) of the remaining coordinates of \(f_{G_{\ne }}\). Namely,

$$\begin{aligned} \Delta (f_{G_{\ne }}(w),f_{G_{\ne }}(w'))>\gamma . \end{aligned}$$

Since \(f_{G_{\ne }}(x)= \overline{f_{G_{\ne }}(w)}\) it follows that

$$\begin{aligned} \Delta (f_{G_{\ne }}(x),f_{G_{\ne }}(w'))<1-\gamma . \end{aligned}$$

Furthermore, since \(w'\) collides with w under \(f_{G_=}\) we have that

$$\begin{aligned} f_{G_=}(x)=f_{G_=}(w)=f_{G_=}(w'). \end{aligned}$$

We conclude that x and w agree on a fraction of \(1-\frac{1}{2}(1-\gamma )=\frac{1}{2}+\gamma /2\) of the outputs of \(f_G\) (\(\gamma \)-fraction of the coordinates of \(f_{G_{\ne }}\) and all the coordinates of \(f_{G_{=}}\)).

The above argument is over-optimistic, since it is not clear that \(w'\) is statistically independent of the subgraph \(G_{\ne }\). (Indeed, the adversary \(\mathcal {A}\) chooses \(w'\) based on \((w,G_{=})\) which contain some information on x and, therefore, also on \(G_{\ne }\).) Fortunately, we can show that a failure of the above approach allows to distinguish the string \(y=f_{G}(x)\) from a truly random string. Hence, we are in a win–win situation: Either we can invert \(\mathcal {F}\) by finding a correlated string or we can distinguish its output from a random string. So the theorem can be based on the pseudorandomness of \(\mathcal {F}_{P,n,n+\Omega (n)}\). \(\square \)

The above reduction leaves us with a \(\delta \)-secure \(\beta \)-TCR \(\mathcal {H}\) of linear shrinkage \(n-m=\varepsilon n\). To prove Theorem 1.1, we show that, for sufficiently small constants \(\delta ,\beta >0\), any \((\delta ,\beta )\)-target collision resistance (TCR) \(\mathcal {H}\) can be locally amplified into standard TCR while preserving linear shrinkage. This is done via the following steps.

Amplifying Hardness. First we reduce the security parameter \(\delta \) to be negligible at the expense of slightly increasing the distance parameter \(\beta \) to, say, \(\beta +2\delta \). This is done by taking t independent copies of \(\mathcal {H}\) and applying them to t independent inputs, i.e., \(h'(x_1,\ldots ,x_t)=(h_1(x_1),\ldots ,h_t(x_t))\). It is not hard to see that any \((\beta +2\delta )\)-far collision \(x=(x_1,\ldots ,x_t)\) and \(y=(y_1,\ldots ,y_t)\) under \(h'\) induces \(\beta \)-far collisions \((x_i,y_i)\) for at least \(2\delta \)-fraction of the copies of h. Standard Threshold Direct Product Theorems (e.g., [18, Theorem 5.2]) guarantee that the latter task cannot be achieved with more than negligible probability.

Eliminating Close Collisions. In the second step we eliminate \(\beta \)-close collisions by letting \(h'(x)=(h(x),Mx)\) where M is a parity-check matrix whose dual relative distance is \(\beta \). It is not hard to show that a pair of \(\beta \)-close strings x and \(x'\) will always be mapped by M to different outputs \(y\ne y'\), and so the function \(h'\) is immunized against \(\beta \)-close collisions. Since there are sparse parity-check matrices with constant dual relative distance (aka LDPC), the transformation is locally computable.Footnote 5 Finally note that although the shrinkage factor is slightly degraded, we can locally amplify it to any constant via a constant number of self-compositions.

Organization. Section 2 gives the necessary preliminaries. In Sect. 3, we present a new notion of sensitivity for predicates, study its properties and identify a class of “good” predicates for which our results apply. In Sect. 4 we reduce the one-wayness of random local functions to \((\delta ,\beta )\) target collision resistance. Later, in Sect. 5, we show how to transform \((\delta ,\beta )\) TCR to standard TCR while preserving constant locality and linear shrinkage. Finally, in Sect. 6 we combine the results of the previous sections and derive the main theorem and its applications.

2 Preliminaries

General. We let [n] denote the set \(\left\{ 1,\ldots ,n\right\} \). For a pair of strings \(x,x'\in \{0,1\}^n\), we let \(\Delta (x,x')\) denote the relative Hamming distance between x and \(x'\), i.e., \(|\left\{ i\in [n]: x_i\ne x'_i\right\} |/n\). A pair of strings is \(\alpha \)-close if \(\Delta (x,x')\le \alpha \) and \(\alpha \)-far if \(\Delta (x,x')> \alpha \). By default, logarithms are taken to base 2. For reals \(p,q\in (0,1)\) we let \(H_2(p):=-p\log (p)-(1-p)\log (1-p)\) denote the binary entropy function, and \(D_2(p\Vert q):=p \log (\frac{p}{q}) +(1-p)\log (\frac{1-p}{1-q})\) denote the relative entropy function (also known as the binary Kullback–Leibler divergence). Observe that \(D_2(p\Vert \frac{1}{2})=1-H_2(p)\). We will use the following form of Chernoff–Hoeffding:

Fact 2.1

(Additive Chernoff bound) Let \(X_1,\ldots ,X_n\) be i.i.d. random variables where \(X_i\in [0,1]\) and \({{\mathrm{\mathbb {E}}}}[X_i]=p\). Then, for every \(\varepsilon >0\),

$$\begin{aligned} \Pr \left[ n^{-1}\sum _i X_i \ge p+\varepsilon \right] \le 2^{-D_2(p+\varepsilon \Vert p)n}, \qquad \Pr \left[ n^{-1}\sum _i X_i \le p-\varepsilon \right] \le 2^{-D_2(p-\varepsilon \Vert p)n} \end{aligned}$$

A simpler form follows by noting that \(D_2(p +\varepsilon \Vert p)>2\varepsilon ^2\).

Locality and Degree. Let \(f:\{0,1\}^n\rightarrow \{0,1\}^l\) be a function. We say that the ith output variable \(y_i\) depends on the jth input variable \(x_j\) (or equivalently, \(x_j\) affects the output \(y_i\)) if there exists a pair of input strings which differ only on the jth location whose images differ on the ith location. The locality of an output variable (resp., input variable) is the number of inputs on which it depends (resp., on which it affects). We say that an output has degree d if it can be expressed as a multivariate polynomial of degree d in the inputs over the binary field \(\mathbb {F}_2\). The locality of an output variable trivially upper bounds its degree.

Collection of Functions. We model cryptographic primitives as collections of functions \(\mathcal {F}=\left\{ f_k:\{0,1\}^n\rightarrow \{0,1\}^{m(n)}\right\} _{k\in \{0,1\}^{s(n)}}\) equipped with a pair of efficient algorithms: (1) an evaluation algorithm which given \((k\in \{0,1\}^{s},x\in \{0,1\}^n)\) outputs \(f_k(x)\) and (2) a key-sampling algorithm \(\mathcal {K}\) which given \(1^n\) samples a index \(k\in \{0,1\}^{s(n)}\). We will sometimes keep the key sampler implicit and write \(f\mathop {\leftarrow }\limits ^{R}\mathcal {F}\) to denote the experiment where \(k\mathop {\leftarrow }\limits ^{R}\mathcal {K}(1^n)\) and \(f=f_k\). A collection of functions has constant output locality (resp., constant input locality) if there exists a constant d which does not grow with n such that for every fixed k each output (resp., input) of the function \(f_k\) has locality of at most d. Similarly, the collection has constant algebraic degree of d if for every fixed k each output of the function \(f_k\) has degree of at most d. The collection is locally computable if it has both constant input locality and constant output locality. When \(\mathcal {F}\) is used as a primitive, we will always assume that the adversary that tries to break it gets the collection index as a public parameter. Moreover, our constructions are all in the “public-coin” setting, and so they remain secure even if the adversary gets the coins used to sample the index of the collection.

One-wayness and Pseudorandomness. Let \(\delta (n)\in (0,1)\) and \(\beta (n)\in (0,\frac{1}{2})\). We say that a collection of functions \(\mathcal {F}=\left\{ f_k:\{0,1\}^n\rightarrow \{0,1\}^{m(n)}\right\} \) is \(\delta \)-secure \(\beta \)-approximation-resilient one-way (in short, \((\delta ,\beta )\) one-way) if for every efficient adversary \(\mathcal {A}\) the following event happens with probability at most \(\delta (n)\): Given \(k\mathop {\leftarrow }\limits ^{R}\mathcal {K}(1^n)\) and \(y=f_k(x)\) for random \(x\mathop {\leftarrow }\limits ^{R}\{0,1\}^n\), the adversary \(\mathcal {A}\) outputs a list of candidates \(X'\) which contains some string \(x'\) which is \(\beta \)-close to some preimage of y. Note the size of the list is bounded by the running time of the adversary, which is polynomial in n. The special case of \(\beta =0\) corresponds to the standard notion of \(\delta \)-one-wayness, or simply one-wayness when \(\delta ={\mathrm {neg}}(n)\). This is consistent with standard one-wayness (cf. [15]) as when \(\delta =0\), the algorithm can efficiently check which of the candidates (if any) is a preimage and output only a single candidate z rather than a list. A collection of functions \(\mathcal {F}\) is \(\delta \)-pseudorandom if \(\left| \Pr [\mathcal {A}(k,f_k(x))=1]-\Pr [\mathcal {A}(k,y)=1] \right| \le \delta (n),\) where \(k\mathop {\leftarrow }\limits ^{R}\mathcal {K}(1^n)\), \(x\mathop {\leftarrow }\limits ^{R}\{0,1\}^n\) and \(y\mathop {\leftarrow }\limits ^{R}\{0,1\}^{m}\).

Hash Functions. Let \(m=m(n)<n\) be an integer-valued function. A collection of functions \(\mathcal {H}=\left\{ h:\{0,1\}^n\rightarrow \{0,1\}^m\right\} \) is \(\delta \)-secure \(\beta \) target collision resistance (\((\delta ,\beta )\)-TCR) if for every pair of efficient adversaries \(\mathcal {A}=(\mathcal {A}_1,\mathcal {A}_2)\) it holds that

$$\begin{aligned} \mathop {\mathop {{\Pr }}\limits _{(x,r) \mathop {\leftarrow }\limits ^{R}\mathcal {A}_1(1^n)}}\limits _{{k\mathop {\leftarrow }\limits ^{R}\mathcal {K}(1^n)}}[\mathcal {A}_2(k,x,r)=x' \text { s.t. } \Delta (x',x)>\beta \text { and } h_k(x)=h_k(x')]\le \delta , \end{aligned}$$

where \(\Delta (\cdot ,\cdot )\) denotes relative Hamming distance. That is, first the adversary \(\mathcal {A}_1\) specifies a target string x and a state information r; then, a random hash function h is selected, and then \(\mathcal {A}_2\) tries to form a \(\beta \)-far collision \(x'\) with x under h. The collection is \(\delta \)-secure \(\beta \) random target collision resistance (\((\delta ,\beta )\) RTCR) if the above holds in the special case where \(\mathcal {A}_1\) outputs a uniformly chosen target string \(x\mathop {\leftarrow }\limits ^{R}\{0,1\}^n\) and an empty state information. (As we will see, there are standard local transformations from RTCR to TCR.) The standard notions of \(\delta \)-RTCR and \(\delta \)-TCR correspond to the case where \(\beta =0\) (or just \(\beta <1/n\)). If, in addition, \(\delta \) is negligible we obtain standard RTCR and TCR. The shrinking factor of \(\mathcal {H}\) is the ratio m / n. When \(m/n<1/(1+H_2(\beta ))\) and \(\delta =o(1)\) TCR and RTCR become non-trivial in the sense that their existence implies the existence of one-way functions. For an extensive study of hash functions see [9, 25].

Random Local Functions. Let \(P:\{0,1\}^d\rightarrow \{0,1\}\) be a predicate, and let \(G=(S_1,\ldots ,S_m)\) where each \(S_i\) is a d-tuple \((S_{i,1},\ldots ,S_{i,d})\) whose entries are d distinct elements of [n]. We will think of G as a bipartite graph with n input nodes and m output nodes where each output i is connected to the d (ordered) inputs in \(S_i\). We define the function \(f_{G,P}:\{0,1\}^n\rightarrow \{0,1\}^{m}\) as follows: Given an n-bit input x, the ith output bit \(y_i\) is computed by applying P to the restriction of x to the ith tuple \(S_i\), i.e.,

$$\begin{aligned} y_i=P(x_{S_i})=P(x_{S_{i,1}},\ldots ,x_{S_{i,d}}). \end{aligned}$$

For \(m=m(n)\) and some fixed predicate \(P:\{0,1\}^{d}\rightarrow \{0,1\}\), we let \(\mathcal {F}_{P,n,m}\) denote the collection \(\left\{ f_{G,P}:\{0,1\}^n\rightarrow \{0,1\}^{m(n)}\right\} \) where the key G is sampled by selecting m(n) tuples uniformly and independently at random from all the possible d-tuples with distinct elements. We refer to the latter distribution as the uniform distribution over (nmd) graphs and denote it by \(\mathcal {G}_{n,m,d}\). When the predicate P is clear from the context, we omit it from the subscript and write \(f_G\) and \(\mathcal {F}_{n,m}\). By definition, the ensemble \(\mathcal {F}_{P,n,m}\) has a constant output locality of d. However, some inputs will have large (super-constant) locality. Still, one can show, via simple probabilistic argument, that the locality of most inputs will be close to the expectation md / n which is constant when \(m=O(n)\). We will later use this fact to reduce the input locality to constant.

3 Sensitivity

3.1 Overview

Let \(P:\{0,1\}^d\rightarrow \{0,1\}\) be a d-ary predicate. For a pair of strings \(x,x'\in \{0,1\}^n\), let \(s_P(x,x')\) be the expected relative Hamming distance between the images f(x) and \(f(x')\) where f is randomly chosen from \(\mathcal {F}_{P,n,m}\). Equivalently, we may write \(s_P(x,x')\) as

$$\begin{aligned} \Pr _S[P(x_S)\ne P(x'_S)], \end{aligned}$$
(1)

where S is a random d-tuple with distinct elements \((i_1,\ldots ,i_d)\) which are chosen from [n] uniformly at random (without replacement).

Imagine the following experiment: first x is chosen uniformly at random, and then an \(\alpha \)-far string \(x'\) is chosen adversarially in order to minimize \(s_P(x,x')\). We will be interested in predicates P for which, except with negligible probability over the choice of x, the value of \(s_P(x,x')\) in the above experiment will be relatively high (as a function of \(\alpha \)).

To analyze this property we make several simple observations. By symmetry, the strategy of the adversary boils down to selecting the fraction \(\alpha _{0,1}\) of 0’s which are flipped to 1, and the fraction \(\alpha _{1,0}\) of 1’s which are flipped to 0’s (where \(\alpha =\alpha _{0,1}+\alpha _{1,0}\)). Furthermore, it suffices to analyze a simpler experiment in which x is a random string of Hamming weight n / 2 and the tuple S (from Eq. 1) is chosen by selecting d indices uniformly at random from [n] with replacement (i.e., the entries may not be distinct). We will show (in Lemma 3.1) that, with all but negligible probability over x, these simplifications have only a minor effect on the value of the experiment (the error tends to zero with n). We will later show (Lemma 3.3) that for every constants \(\beta >0\) and \(\gamma <\frac{1}{2}\) there are some concrete (nonlinear) highly sensitive predicates for which a modification of more than \(\beta \) fraction of the inputs, flips the output with probability larger than \(\gamma \).

3.2 Generalized Noise Sensitivity

The above discussion motivates a new quantitative measure of sensitivity which refines the standard notion of noise sensitivity. For \(\alpha _{0,1},\alpha _{1,0}\in [0,\frac{1}{2}]\), let \(\mathcal {D}(\alpha _{0,1},\alpha _{1,0})\) be a distribution over pairs \(w,w'\in \{0,1\}^d\) where w is chosen uniformly at random and the ith bit of \(w'\) is obtained by flipping the ith bit of w with probability \(2\alpha _{0,1}\) if \(w_i=0\), and with probability \(2\alpha _{1,0}\) if \(w_i=1\). Hence, the pair \((w_i,w'_i)\) takes the value 01 (respectively, 00, 10 and 11) with probability \(\alpha _{01}\) (respectively, \(\frac{1}{2}-\alpha _{01}\), \(\alpha _{10}\) and \(\frac{1}{2}-\alpha _{10}\)). For \(\alpha \in [0,1]\) let \(s_P(\alpha )\) denote the infimum of \(\Pr _{(w,w')\mathop {\leftarrow }\limits ^{R}\mathcal {D}(\alpha _0,\alpha _1)}[P(w)\ne P(w')]\) taken over all \(\alpha _{0,1}\) and \(\alpha _{1,0}\) which sum up to \(\alpha \). Call \(x\in \{0,1\}^n\) typical if its Hamming weight is \(n/2\pm n^{2/3}\). By Chernoff bound, a random string is typical with all but negligible probability. The following lemma relates \(s_P(x,x')\) to \(s_P(\alpha )\).

Lemma 3.1

For every predicate P, the function \(s_P(\alpha )\) is well defined and continuous. Also, for every typical \(x\in \{0,1\}^n\) and every string \(x'\in \{0,1\}^n\)

$$\begin{aligned} s_P(x,x')\ge s_P(\Delta (x,x'))- \delta (n), \end{aligned}$$

where the error term \(\delta (n)=o(1)\).

Proof

Fix P and let \(s(\cdot )=s_P(\cdot )\). Let \(\mathcal {D}\) be an arbitrary probability distribution over pair of bits, which is described by the probability vector \(z=(z_{00},z_{01},z_{10},z_{11})\). Sample a pair of d-bit strings \(w,w'\in \{0,1\}^d\) by collecting d independent samples of bit pairs \((w_i,w'_i)\) from \(\mathcal {D}\). Then the quantity \(\Pr _{(w,w')}[P(w)\ne P(w')]\) can be written as a degree d multivariate polynomial in z:

$$\begin{aligned} Q(z)=\mathop {\mathop {\sum }\limits _{{w,w'\in \{0,1\}^d}}}\limits _{{P(w)\ne P(w')}}\prod _{i=1}^d z_{w_iw'_i}. \end{aligned}$$

Specifically, for every fixed \(\alpha \) we can write \(s(\alpha )\) as

$$\begin{aligned} \inf _{\alpha _{0,1}\in \left[ \max (0,\alpha -\frac{1}{2}),\min (\alpha ,\frac{1}{2})\right] }Q(z) \end{aligned}$$

where

$$\begin{aligned} z_{01}=\alpha _{0,1}, z_{00}=\frac{1}{2}-\alpha _{0,1}, z_{10}=\alpha -\alpha _{0,1}, z_{11}=1-(\alpha -\alpha _{0,1}). \end{aligned}$$

Hence, we are minimizing a degree d univariate polynomial over a closed interval, and so \(s(\alpha )\) is well defined. We also conclude that \(s(\alpha )\) is continuous since it is defined to be the minimum over the interval \([0,\alpha ]\) of a continuous function (univariate polynomial).

We move on to the second part of the lemma. Fix some pair of n-bit strings x and \(x'\), and define the frequency vector \(z=(z_{00},z_{01},z_{10},z_{11})\) to be \(z_{\sigma _1 \sigma _2}=|\left\{ i:(x_ix'_i)=\sigma _1 \sigma _2\right\} |/n\). Imagine that we were choosing the tuple S uniformly at random from \([n]^d\) allowing repetitions. Then, \(s(x,x')=\Pr _S[P(x_S)\ne P(x'_S)]=Q(z)\). Since this is not the case and the elements of S are chosen without repetitions, the quantity \(s(x,x')\) equals to

$$\begin{aligned} Q_n(z_{00},z_{01},z_{10},z_{11}) = \mathop {\mathop {\sum }\limits _{{a,b\in \{0,1\}^d}}}\limits _{{P(a)\ne P(b)}} \prod _{i=1}^d (z_{a_ib_i} -\delta (a,b,i,z_{a_ib_i})), \end{aligned}$$
(2)

where \(\delta (a,b,i,z_{a_ib_i})=\min (|\left\{ j<i: (a_j,b_j)=(a_i,b_i)\right\} |/n,z_{a_ib_i})\). For every \(z=(z_{00},z_{01},z_{10},z_{11})\) we have

$$\begin{aligned} Q(z)-2^{2d} d^2/n \le Q_n(z)\le Q(z), \end{aligned}$$

where the left inequality follows by noting that \(\delta (a,b,i)\le d/n\) and that for reals \(p_i\ge \delta _i\) and integer t we have \(\prod _{i=1}^t (p_i-\delta _i) \ge (\prod _{i} p_i) -\sum _i\delta _i\). Now assume that x is typical, and let z be the frequency vector of x and \(x'\). By definition, \(z_{00}+z_{01}\in [\frac{1}{2}\pm n^{-1/3}]\) and \(z_{01}+z_{10}=\Delta (x,x')\). By adding/subtracting a small quantity of at most \(n^{-1/3}\) to each coordinate of z, we can define a related balanced frequency vector \(z'\) for which \(z'_{00}+z'_{01}=\frac{1}{2}\) and \(z'_{01}+z'_{10}=\Delta (x,x')\). Observe that in this case \(Q(z)\ge Q(z')-2^{2d}d/n^{1/3}\), and overall it follows that

$$\begin{aligned} s(x,x') = Q_n(z)\ge Q(z)-2^{2d} d^2/n \ge Q(z')-(2^{2d} d^2/n)-(2^{2d}d/n^{1/3}). \end{aligned}$$

Since, by definition, \(Q(z')\) is lower bounded by \(s(z'_{01}+z'_{10})=s(\alpha )\) it follows that \(s(x,x')\ge s(\alpha )-\delta (n)\) where \(\delta (n)=2^{2d} d^2/n + 2^{2d}d/n^{1/3}=o(1)\) and the lemma follows. \(\square \)

3.3 Good Predicates

Definition 3.2

(Good predicates) We say that P is \((\beta ,\gamma )\) good if:

  1. 1.

    The value of \(s_P(\cdot )\) is lower bounded by \(\gamma \) in the interval \([\beta ,1]\); and

  2. 2.

    P has a sensitive coordinate meaning that \(P(w)=w_1\oplus P'(w_2,\ldots ,w_d)\) for some \((d-1)\)-ary predicate \(P'\).

Motivation. Recall that in Sect. 3.1, we described a game in which an adversary is given a random string x and outputs an \(\alpha \)-far string \(x'\) with the hope of minimizing \(s_P(x,x')\). (The latter quantity essentially approximates the distance between \(f_{G,P}(x)\) and \(f_{G,P}(x')\) for random G.) Property (1) above guarantees that as long as \(\alpha >\beta \), the value of \(s_P(x,x')\) will be at least \(\gamma \) (except for the negligible event where x is non-typical). The second property of Definition 3.2 is needed for two reasons. First, it allows us to use a theorem from [4] which reduces the pseudorandomness of the ensemble \(\mathcal {F}_{P,n,m}\) to its one-wayness. In addition, it is not hard to verify that this condition implies that \(s_P(\frac{1}{2})=\frac{1}{2}\). The latter property implies that for proper output length \(\ell \), the ensemble \(\mathcal {F}_{P,n,\ell }\) satisfies the following: If a pair of images \(y=f(x)\) and \(y'=f(x')\) is highly correlated, then the preimages x and \(x'\) must also have a non-trivial correlation. This property (to be formalized in Claim 4.4) will turn to be useful later.

Usage. In Sect. 4 we will use \((\beta ,\gamma )\)-good predicate P to construct \(\beta \)-RTCRs with shrinkage factor of \(1-\varepsilon \) for a constant \(\varepsilon \in (0,\frac{1}{2})\) which satisfies the inequality

$$\begin{aligned} \varepsilon <1-\frac{1}{2(1-H_2(\frac{1}{2}-\gamma ))}, \end{aligned}$$
(3)

where \(H_2\) denotes the binary entropy function. As a result, we would like to have a small value of \(\beta >0\) and a large value of \(\gamma < \frac{1}{2}\) (which leads to a larger \(\varepsilon \) and better shrinkage). It turns out that by increasing the locality, one can simultaneously push \(\beta \) arbitrarily close to 0 and \(\gamma \) arbitrarily close to \(\frac{1}{2}\). This is illustrated by the following family of predicates.

Lemma 3.3

Let Q be c-ary predicate for which \(s_Q(1)\le \frac{1}{2}\). For every constants \(\gamma <\frac{1}{2}\) and \(\beta >0\) there exists a constant d for which the predicate

$$\begin{aligned} P(x_1,\ldots ,x_d,x_{d+1},\ldots , x_{d+c})=(x_{1}\oplus \ldots \oplus x_{d})\oplus Q(x_{d+1},\ldots , x_{d+c}) \end{aligned}$$

is \((\beta ,\gamma )\)-good.

Proof

Fix some constants \(\gamma \in (0,\frac{1}{2})\) and \(\beta >0\). We will show that for sufficiently large odd d (whose value will be determined later) the predicate P is \((\beta ,\gamma )\)-good. Clearly the predicate has a sensitive coordinate, and so it is left to show that for every \(\alpha \in (\beta ,1)\) and every \(\alpha _{01}\in [\max (0,\alpha -\frac{1}{2}),\min (\alpha ,\frac{1}{2})]\)

$$\begin{aligned} \mathop {\Pr }\limits _{(w,w')\mathop {\leftarrow }\limits ^{R}\mathcal {D}(\alpha _{01},\alpha -\alpha _{01})}[P(w)\ne P(w')]>\gamma , \end{aligned}$$
(4)

Since P is computed by applying the predicate Q and the d-wise XOR predicate on disjoint inputs and XOR-ing the outcomes, we can write the LHS of Eq. 4 as

$$\begin{aligned} q(\alpha ,\alpha _{01})\cdot (1-p_{\oplus }(\alpha ,\alpha _{01})) + (1-q(\alpha ,\alpha _{01}))\cdot p_{\oplus }(\alpha ,\alpha _{01}), \end{aligned}$$
(5)

where

$$\begin{aligned} q(\alpha ,\alpha _{01})= \mathop {\Pr }\limits _{(w,w')\mathop {\leftarrow }\limits ^{R}\mathcal {D}(\alpha _{01},\alpha -\alpha _{01})}\left[ Q(w) \ne Q(w')\right] , \end{aligned}$$

and

$$\begin{aligned} p_{\oplus }(\alpha ,\alpha _{01})= \Pr _{(w,w')\mathop {\leftarrow }\limits ^{R}\mathcal {D}(\alpha _{01},\alpha -\alpha _{01})}\left[ \bigoplus _{i=1}^{d} w_i \ne \bigoplus _{i=1}^{d} w'_i\right] . \end{aligned}$$

Letting \(\chi _i\) denote \(w_i\oplus w'_i\), we can rewrite \(p_{\oplus }(\alpha ,\alpha _{01})\) as \(\Pr [\chi _1\oplus \ldots \oplus \chi _d=1]\). Observe that the \(\chi _i\)’s are independent Bernoulli variables with mean \(\alpha \). Therefore, \(p_{\oplus }(\alpha ,\alpha _{01})\) is just the probability of seeing an odd number of successes when tossing d independent \(\alpha \)-biased coins. It is not hard to verify (e.g., by induction on d) that for odd d we have

$$\begin{aligned} p_{\oplus }(\alpha ,\alpha _{01})=p_{\oplus }(\alpha )=\frac{1}{2}+\frac{1}{2}(2\alpha -1)^{d}. \end{aligned}$$

Overall, Eq. 5 simplifies to

$$\begin{aligned} q(\alpha ,\alpha _{01})(1-2p_{\oplus }(\alpha ))+p_{\oplus }(\alpha ). \end{aligned}$$
(6)

Fix some positive \(\varepsilon <\frac{1}{2}-\gamma \) and let \(\delta \in (0,\frac{1}{2})\) be a small constant for which \(q(\alpha ,\alpha _{01})\le \frac{1}{2} +\epsilon \) for every \(\alpha \in [1-\delta ,1]\) and \(\alpha _{01}\in [\alpha -\frac{1}{2},\frac{1}{2}]\). Such a \(\delta \) is promised to exist since \(q(1,1/2)=s_{Q}(1)\le \frac{1}{2}\) and since \(q(\cdot ,\cdot )\) is continuous (as shown in the proof of Lemma 3.1). Let d be an odd integer which is larger than \(\max (\frac{\log (1-2\gamma )}{\log (1-2\beta )}, \frac{\log (1-2\gamma )}{\log (1-2\delta )})\). We prove that (6) is larger than \(\gamma \) via case analysis.

Case 1: \(\beta \le \alpha \le \frac{1}{2}\). Observe that both \(q(\alpha ,\alpha _{01})\) and \(1-2p_{\oplus }(\alpha )=-(2\alpha -1)^d\) are non-negative (as \(\alpha \le \frac{1}{2}\) and d is odd). Therefore, (6) is lower bounded by

$$\begin{aligned} p_{\oplus }(\alpha )\ge p_{\oplus }(\beta )=\frac{1}{2}+\frac{1}{2}(2\beta -1)^{d}>\gamma , \end{aligned}$$

where the first inequality follows from the fact that \(p_{\oplus }(\cdot )\) is increasing in the interval \([\beta ,\frac{1}{2}]\), and the last inequality holds as \(d>\log (1-2\gamma )/\log (1-2\beta )\).

Case 2: \(\frac{1}{2}\le \alpha \le 1-\delta \). Since \(1-2p_{\oplus }(\alpha )=-(2\alpha -1)^d\) is negative and \(q(\alpha ,\alpha _{01})\le 1\), (6) is lower bounded by

$$\begin{aligned} 1-2p_{\oplus }(\alpha )+p_{\oplus }(\alpha )=\frac{1}{2}-\frac{1}{2}(2\alpha -1)^d \end{aligned}$$

which is monotonously decreasing in the interval \([\frac{1}{2},1-\delta ]\). It follows that the last term is lower bounded by \(\frac{1}{2}-\frac{1}{2}(1-2\delta )^d\) which is larger than \(\gamma \) since \(d>\log (1-2\gamma )/\log (1-2\delta )\).

Case 3: \(1-\delta \le \alpha \le 1\). Since \(\alpha > \frac{1}{2}\) the term \(1-2p_{\oplus }=-(2\alpha -1)^d\) is negative, and so (6) is minimized when \(q(\alpha ,\alpha _{01})\) is maximized. Recall that \(q(\alpha ,\alpha _{01})\le \frac{1}{2} +\epsilon \) for \(\alpha > 1-\delta \). Overall, (6) is lower bounded by

$$\begin{aligned} \left( \frac{1}{2}+\varepsilon \right) \left( 1-2p_{\oplus }(\alpha )\right) +p_{\oplus }(\alpha )=\frac{1}{2}+\epsilon -2\epsilon p_{\oplus }(\alpha )\ge \frac{1}{2}-\epsilon >\gamma , \end{aligned}$$

as required. \(\square \)

Concrete instantiation. Observe that the condition \(s_Q(1)\le \frac{1}{2}\) simply means that \(\Pr _w[Q(w)\ne Q(\bar{w})\le \frac{1}{2}]\), where w is a random c-bit string and \(\bar{w}\) is the complement of w.Concretely, we suggest to let Q be the c-wise AND, for an arbitrary constant \(c\ge 2\). (In this case, \(\Pr _w[\bigwedge (w)\ne \bigwedge (\bar{w})\le 2\cdot 2^{-c}]\le \frac{1}{2}\).) This leads to the following family of good predicates

$$\begin{aligned} \mathsf {MST}_{d,c}=x_1 \oplus \ldots \oplus x_{d} \oplus (x_{d+1}\wedge \ldots \wedge x_{d+c}) \end{aligned}$$
(7)

which generalizes the predicate from [22]. The previous lemma implies that for every constants \(\gamma <\frac{1}{2}\), \(\beta >0\) and integer \(c\ge 2\) there exists a constant d for which \(\mathsf {MST}_{d,c}\) is \((\beta ,\gamma )\)-good.

4 Random Local Functions are \((\delta ,\beta )\)-RTCR

In Sect. 4.1 we prove the following theorem.

Theorem 4.1

Let P be a \((\beta ,\gamma )\)-good predicate. Assume there exists a constant \(\varepsilon \in (0,\frac{1}{2})\) which satisfies Eq. 3, and let \(m=(1-\varepsilon )n\). Then, there exists a constant \(\mu >0\), such that for every \(\delta _1(n)\) and \(\delta _2(n)\) if \(\mathcal {F}_{P,n,2m}\) is both \(\delta _1\)-pseudorandom and \((\delta _2,\frac{1}{2}-\mu )\) one-way then \(\mathcal {F}_{P,n,m}\) is \(\delta '\)-secure \(\beta \)-RTCR where \(\delta '=\delta _1+\delta _2+{\mathrm {neg}}(n)\).

In our proof the negligible overhead in the \(\delta '\) expression is shown to be \(2^{-\Omega (n^{1/3})}\). We did not attempt to optimize this term and it seems that a more careful analysis yields an exponential expression of \(2^{-cn}\) where the constant c depends on the predicate P.

It turns out that, for random local functions, approximate one-wayness follows from one-wayness [10], which, in turn, (trivially) follows from pseudorandomness. Therefore, the implication of Theorem 4.1 can be based solely on pseudorandomness.Footnote 6 Formally, we derive the following corollary.

Corollary 4.2

Let P be a \((\beta ,\gamma )\)-good d-ary predicate and assume that \(\varepsilon >0\) is a constant that satisfies Eq. 3. Then, there exists a constant \(c=c(P,\varepsilon )>0\) such that for every \(\delta \), if \(\mathcal {F}_{P,n,cn}\) is \(\delta \)-pseudorandom then \(\mathcal {F}_{P,n,(1-\varepsilon )n}\) is \(3\delta \)-secure \(\beta \)-RTCR.

Proof

Fix \(P,\varepsilon \), and let \(\mu =\mu (\varepsilon ,P)>0\) be the constant guaranteed by Theorem 4.1. Let \(\delta \) be an arbitrary inverse polynomial. Assume that \(\mathcal {F}_{P,n,cn}\) is \(\delta \)-pseudorandom for some sufficiently large constant \(c=c(P,\mu )\) whose value will be determined later. By employing Theorem 4.1 with \(\delta _1=\delta _2=\delta \), it suffices to show that \(\mathcal {F}_{P,n,2(1-\varepsilon )n}\) is both \(\delta \)-pseudorandom and \((\delta ,\frac{1}{2}-\mu )\) one-way. The pseudorandomness condition is trivially satisfied for \(c>2\). To establish approximation-resilient one-wayness, we first observe that since \(\mathcal {F}_{P,n,cn}\) is \(\delta \)-pseudorandom it must also be \(\delta '\) one-way for \(\delta '=\delta +2^{(1-c)n}<\delta +o(1)\), assuming that \(c>1\). (To see this just use the hypothetical \(\delta '\)-inverter as a distinguisher in the straightforward way, cf. [15, Section 3.3.6].) Next, we employ a theorem of Bogdanov and Qiao [10, Theorem 1.3] which asserts that for every constant \(\mu >0\) there exists a constant \(k=k(\mu ,d)\) such that if \(\mathcal {F}_{P,n,kn}\) is \(\delta '\) one-way then it is also \((\delta '+o(1),\frac{1}{2}-\mu )\) one-way (for every inverse polynomial \(\delta '\)). Letting c be a sufficiently large constant (e.g., larger than \(\max (k,2)\)), the corollary follows. \(\square \)

We note that the corollary is valid even if \(\delta \) decreases with n (as long as it is inverse polynomial), although we will employ it only with small constant values.

4.1 Proof of Theorem 4.1

Let \(\mu \) be a constant which depends on P and \(\varepsilon \) whose value will be determined later. Assume, toward a contradiction, that \(\mathcal {F}_{P,n,m}\) is not \(\delta '\)-secure \(\beta \)-RTCR. Namely, there exists an efficient adversary \(\mathcal {A}\) which, given a random target \(w\mathop {\leftarrow }\limits ^{R}\{0,1\}^n\) and a random graph \(G\mathop {\leftarrow }\limits ^{R}\mathcal {G}_{n,m,d}\), finds, with probability \(\delta '\), a string z which is a \(\beta \)-far sibling of w under \(f_G\). It will convenient to further assume that \(\mathcal {A}\) has a similar success probability when G is a random \((n,m',d)\) graph for \(m'<m\). This is without loss of generality, since such a graph G can be always padded into a random (nmd) graph H; clearly any \(\beta \)-far sibling of w under \(f_H\) is also a \(\beta \)-far sibling of w under \(f_G\).

Assume that \(\mathcal {F}_{P,n,2m}\) is \(\delta _1\)-pseudorandom. We construct an attacker \(\mathcal {B}\) who breaks the \((\delta _2,\frac{1}{2}-\mu )\) one-wayness of \(\mathcal {F}_{P,n,2m}\). Given a graph \(G=(S_1,\ldots ,S_{2m})\) and a string \(y\in \{0,1\}^{2m}\), the algorithm \(\mathcal {B}\) is defined as follows:

  1. 1.

    Randomly choose \(w\mathop {\leftarrow }\limits ^{R}\{0,1\}^n\) and let \(r=f_{G,P}(w)\oplus y\). (Think of r as representing the set of indices for which y and the image of w disagree.)

  2. 2.

    Fail, if \(m_0\) the number of 0’s in r is smaller than \(m-m^{2/3}\) or larger than \(m+m^{2/3}\).

  3. 3.

    Let \(I_0\) be the set of the first \(\min (m_0,m)\) indices i for which \(r_i=0\), and \(I_1=\left\{ i:r_i=1\right\} \). Let \(G_0=\left\{ S_i:i\in I_0\right\} \) and \(G_{1}=\left\{ S_i:i\in I_1\right\} \). (Note that \(f_{G_0,P}(w)=y_{I_0}\) and that \(f_{G_1,P}(w)=\mathbf 1 \oplus y_{I_1}\).)

  4. 4.

    Apply \(\mathcal {A}\) to \((G_0,w)\) and let \(z\in \{0,1\}^n\) denote the resulting output.

  5. 5.

    If \(P(z_{S_i})=y_i\) for at least \(m(1+\gamma )-3m^{2/3}\) of indices \(i \in [2m]\) output z; Otherwise, Fail.

We begin by bounding the failure probability of the algorithm. Intuitively, the algorithm does not fail due to the following reasoning. Assuming that z is a collision, we have that \(P(z_{S_i})=y_i\) for all the m indices \(i\in I_0\). In addition, if z is \(\beta \)-far from w and statistically independent of \(G_1\) then (since P is \((\beta ,\gamma )\) good), the outputs \(f_{G_1,P}(w)\) and \(f_{G_1,P}(z)\) are expected to disagree on a set of \(\gamma m\) coordinates. Since \(f_{G_1,P}(w)=\mathbf 1 \oplus y_{I_1}\), this translates to \(\gamma m\) indices in \(I_1\) for which \(P(z_{S_i})=y_i\). The above analysis is inaccurate as the random variables z and \(G_1\) are statistically dependent (via the random variable \((w,G_0)\)). Still the above approach can be used when the input y (as well as the graph G) is truly random.

Claim 4.3

\(\Pr _{G\mathop {\leftarrow }\limits ^{R}\mathcal {G}_{n,2m,d}, y\mathop {\leftarrow }\limits ^{R}\{0,1\}^{2m}}[\mathcal {B}(G,y) \text { does not fail}]>\delta '-2^{-\Omega (n^{1/3})}\).

Proof

When the pair (Gy) is uniformly chosen, the process \(\mathcal {B}(G,y)\) can be equivalently described as follows. In the first step, we choose \(S_1,\ldots ,S_{2m}\) uniformly at random, choose a random string \(w\mathop {\leftarrow }\limits ^{R}\{0,1\}^n\), and a random string \(r\mathop {\leftarrow }\limits ^{R}\{0,1\}^{2m}\). We let \(y=f_{G,P}(w)\oplus r\). Then steps 2–5 are performed exactly as before. This process is clearly equivalent to \(\mathcal {B}(G,y)\), but easier to analyze. The main observation is that the string w is statistically independent of the graphs \(G_0\) and \(G_1\) which are just random graphs (whose size is determined by the random variable r).

Specifically, consider the following event:

  1. 1.

    The Hamming weight of r is \(m/2\pm m^{2/3}\);

  2. 2.

    \(\mathcal {A}\) outputs a \(\beta \)-far collision z;

  3. 3.

    The Hamming weight of w is \(n/2\pm n^{2/3}\);

  4. 4.

    \(P(z_{S_i})=y_i\) for at least \(m(1+\gamma )-3m^{2/3}\) of indices \(i \in [2m]\).

By a Chernoff bound, Event (1) happens with probability \(1-2^{-\Omega (n^{1/3})}\). Fix some r which satisfies (1) and let \(m_1\in m\pm m^{2/3}\) be the Hamming weight of r. Now, w is a random string and \(G_0\mathop {\leftarrow }\limits ^{R}\mathcal {G}_{n,m,d}\); hence, \(\mathcal {A}\) is invoked on the “right” probability distribution and (2) happens with probability \(\delta '\). By a Chernoff bound, (3) happens with all but probability \(2^{-\Omega (n^{1/3})}\). Therefore, by union bound, (1–3) and happen simultaneously with probability \(\delta '-2^{-\Omega (n^{1/3})}\). Fix some w and \(G_0\) which satisfy (2) and (3), and let us move to (4).

Since w and z form a collision under \(f_{G_0,P}\), we have that \(f_{G_0,P}(z) =y_{I_0}\) and therefore \(P(z_{S_i})=y_i\) for all the indices \(i\in I_0\). Recalling that \(|I_0|\ge m-m^{2/3}\), it suffices to show that \(P(z_{S_i})=y_i\) for at least

$$\begin{aligned} (\gamma - m^{-1/3})m_1\ge \gamma m - 2m^{2/3} \end{aligned}$$

of the indices in \(I_1\). (Recall that \(m_1>m-m^{2/3}\).) We claim that this happens with all but negligible probability (taken over the random choice of \(G_1\mathop {\leftarrow }\limits ^{R}\mathcal {G}_{n,m_1,d}\)). To see this, define for every \(i\in I_1\) a random variable \(\xi _i\) which equals to one if \(P(z_{S_i})=y_i\). Equivalently, \(\xi _i=1\) if \(P(z_{S_i})\ne P(w_{S_i})\). Furthermore, since the tuples \(S_i\) are distributed uniformly and independently, each \(\xi _i\) takes the value 1 independently with probability at least

$$\begin{aligned} s_P(w,z)\ge s_P(\Delta (w,z))-o(1)>\gamma \end{aligned}$$

where the first inequality follows from Lemma 3.1 and the fact that w is “typical” (of Hamming weight \(n/2\pm n^{2/3}\)); and the second inequality follows from the goodness of P and the fact that \(\Delta (w,z)\ge \beta \). Therefore, by Chernoff’s bound,

$$\begin{aligned} \Pr \left[ \sum \xi _i< (\gamma -m^{-1/3})m_1 \right]< 2^{-D_2(\gamma -m^{-1/3}\Vert \gamma )m_1}<2^{-\Omega (m^{1/3})}<2^{-\Omega (n^{1/3})}. \end{aligned}$$

By summing all the error terms, we derive the claim. \(\square \)

Moving back to the case where y is an image of a random string x, we show that when \(\mathcal {B}\) does not fail its output is likely to be correlated with x.

Claim 4.4

Assume that \(\varepsilon \) and \(\gamma \) satisfy Eq. 3, then there exists a constant \(\mu =\mu (P,\varepsilon )\) such that the following holds. With probability \(1-2^{-\Omega (n^{1/3})}\) over the choice of \(x\mathop {\leftarrow }\limits ^{R}\{0,1\}^n\) and \(G\mathop {\leftarrow }\limits ^{R}\mathcal {G}_{n,2m,d}\), there is no string z such that \(f_{G,P}(x)\) and \(f_{G,P}(z)\) agree on at least \(m(1+\gamma )-3m^{2/3}\) coordinates but \(\Delta (x,z)\in (\frac{1}{2}\pm \mu )\).

Proof

Let \(\mu >0\) be a small constant for which the value of \(s_P(\cdot )\) in the interval \((\frac{1}{2}\pm \mu )\) is lower bounded by a constant \(\eta \) which satisfies \(\eta >\frac{1}{2}-\gamma \) and

$$\begin{aligned} 2(1-\varepsilon )D_2\left( \frac{1}{2}-\gamma \Vert \eta \right) >1. \end{aligned}$$
(8)

To see that such \(\mu \) exists, we make the following observations. First, for \(\mu =0\) the conditions are satisfied. Indeed, \(\eta \) can be taken to be \(\frac{1}{2}\) (recall that \(s_P(\frac{1}{2})=\frac{1}{2}\)) and Eq. 8 simplifies to \(2(1-\varepsilon )H_2(\frac{1}{2}-\gamma )>1\) which follows from Eq. 3. Now, since \(s_P\) is a continuous function, and the LHS of Eq. 8 is also continuous in \(\eta \), we conclude that both conditions also hold for sufficiently small constant \(\mu >0\).

Let us condition on the event that x is typical (as in Lemma 3.1), which, by a Chernoff bound, happens with all but \(2^{-\Omega (n^{1/3})}\) probability. Fix some string z for which \(\Delta (x,z)\in (\frac{1}{2}\pm \mu )\). For a random d-size tuple S we have, by Lemma 3.1, that \(\Pr [P(x_S)\ne P(z_S)]\ge s_P(\Delta (x,z))>\eta -o(1)>\frac{1}{2}-\gamma \). Let \(G=(S_1,\ldots ,S_m)\mathop {\leftarrow }\limits ^{R}\mathcal {G}_{n,2m,d}\). Since each tuple \(S_i\) is chosen independently and uniformly at random, we can upper bound (via Chernoff) the probability that \(f_{G,P}(x)\) and \(f_{G,P}(z)\) disagree on less than \(2m-(m(1+\gamma )-3m^{2/3})= (1-\gamma )m +3m^{2/3}\) of the coordinates by

$$\begin{aligned} p=2^{-2m D_2\left( \frac{1}{2}-\gamma +o(1)\Vert s(x,z)\right) }\le 2^{-2(1-\varepsilon ) D_2\left( \frac{1}{2}-\gamma +o(1)\Vert \eta -o(1)\right) n}. \end{aligned}$$

By a union bound over all z’s, we get that the claim holds with probability \(p\cdot 2^n\) which, by Eq.8, is upper bounded by \(2^{-\Omega (n)}\). \(\square \)

We can now complete the proof of the theorem. Let \(G\mathop {\leftarrow }\limits ^{R}\mathcal {G}_{n,2m,d}\) and \(y=f_{G,P}(x)\) where \(x\mathop {\leftarrow }\limits ^{R}\{0,1\}^n\). Consider the event that: (1) G and x satisfy Claim 4.4 and (2) \(\mathcal {B}(G,y)\) does not fail and outputs the string z. In this case, either the string z or its negation has a non-trivial agreement of \(\frac{1}{2}+\mu \) with x, which may happen with probability at most \(\delta _2\) due to the approximate one-wayness of \(\mathcal {F}_{n,2m}\). Hence, it suffices to show that the above event happens with probability at least \(\delta '-\delta _1-2^{-\Omega (n^{1/3})}\). Indeed, (1) happens with all but probability \(2^{-\Omega (n^{1/3})}\) (due to Claim 4.4), and (2) happens with probability \(\delta '-2^{-\Omega (n^{1/3})}-\delta _1\) due to Claim 4.3 and the fact that (Gy) is \(\delta _1\)-indistinguishable from \((G,y')\) for truly random \(y'\mathop {\leftarrow }\limits ^{R}\{0,1\}^{2m}\). \(\square \)

5 From \((\delta ,\beta )\)-RTCR to TCR

In this section we will transform \(\delta \)-secure \(\beta \)-RTCR with shrinkage factor of \(1-\varepsilon \) and constant output locality into a (standard) TCR with constant shrinkage factor \(\varepsilon '\), constant input locality and constant output locality. Interestingly, we can do this without increasing the algebraic degree. Formally, we prove the following theorem.

Theorem 5.1

For every constant \(\varepsilon \in (0,1)\) there exist universal constants \(\delta ,\beta \in (0,1)\) for which any \(\delta \)-secure \(\beta \)-RTCR \(\mathcal {H}\) with shrinkage factor of \(1-\varepsilon \) and constant output locality can be transformed into a TCR \(\mathcal {H}'\) with shrinkage factor of \(1-\varepsilon /4\), constant input locality, constant output locality and the same algebraic degree as \(\mathcal {H}\). Furthermore, one can obtain an arbitrary constant shrinkage factor of \(\varepsilon '\) at the expense of further increasing the input and output localities to a larger constant (which grows exponentially in \(\log (\varepsilon ')/\log (1-\varepsilon )\)).

The proof relies on a sequence of transformations (described in Sects. 5.15.2) in which we gradually amplify each of the parameters of the underlying collection while keeping the output locality constant.Footnote 7 We refer to such transformations as local. Finally, we observe that once constant output locality and constant shrinkage factor are achieved, constant input locality can be also guaranteed (with a minor loss in the shrinkage).

We note that the theorem can be adopted to the setting of collision-resistant hash functions. Namely, it allows to convert a \(\delta \)-secure \(\beta \)-collision resistance hash function with shrinkage factor \(1-\varepsilon \) and constant output locality into a standard collision resistance hash function with arbitrary constant shrinkage, constant input locality and constant output locality.

5.1 Standard Transformations

We begin with two standard transformations.

Claim 5.2

(RTCR to TCR) Let \(\mathcal {H}=\left\{ h_k\right\} \) be \(\delta \)-secure \(\beta \)-RTCR with shrinkage factor of \(1-\varepsilon \). Then the collection \(\mathcal {H}'=\left\{ h'_{k,y}\right\} \) defined by \(h'_{k,y}(x)=h_k(x\oplus y)\) is \(\delta \)-secure \(\beta \)-TCR with the same shrinkage, output locality, input locality and algebraic degree as \(\mathcal {H}\).

Proof

Let \((\mathcal {A}_1,\mathcal {A}_2)\) be an adversary that contradicts the claim. We construct an adversary \(\mathcal {B}\) that contradicts the hypothesis. Given a random RTCR challenge \(x\mathop {\leftarrow }\limits ^{R}\{0,1\}^n,h_k\mathop {\leftarrow }\limits ^{R}\mathcal {H}\), the adversary \(\mathcal {B}\) computes \(\mathcal {A}_1(1^n)\) and obtains a target y and a state r. Then, \(\mathcal {B}\) invokes \(\mathcal {A}_2\) with the function \(h'_{k,x\oplus y}\), state information r and target y. Finally, \(\mathcal {B}\) outputs \(x'=y'\oplus y \oplus x\) where \(y'\) is the output of \(\mathcal {A}_2\). The claim follows by noting that if y and \(y'\) is a \(\beta \)-far collision under \(h'_{k,x\oplus y}\) then x and \(x'\) is a \(\beta \)-far collision under \(h_{k}\). Clearly, \(\mathcal {H}'\) has the same shrinkage, output locality, input locality and algebraic degree as \(\mathcal {H}\). \(\square \)

Assume that we already have \(\delta \)-secure standard TCR (\(\beta =0\)) with shrinkage factor of \(1-\varepsilon \). A standard way to amplify the shrinkage factor from \(1-\varepsilon \) to \((1-\varepsilon )^t\) is via iterated self-composition [23].

Claim 5.3

(Amplifying the Shrinkage Factor) Let \(\mathcal {H}=\left\{ h_{k}\right\} \) be a \(\delta \)-secure TCR with shrinkage factor of \(1-\varepsilon \) and key sampler \(\mathcal {K}\). For any constant integer \(t\ge 1\), the collection \(\mathcal {H}^t\) (defined below) is \(t\delta \)-secure TCR with shrinkage factor of \((1-\varepsilon )^t\). The collection \(\mathcal {H}^t\) is defined recursively, via

$$\begin{aligned} \mathcal {H}^t=\left\{ h_{k_1,\ldots ,k_t}\right\} , \quad h_{k_1,\ldots ,k_t}(x)= h_{k_t}(h_{k_1,\ldots ,k_{t-1}}(x)), \quad \text {where } k_i\mathop {\leftarrow }\limits ^{R}\mathcal {K}\left( 1^{n(1-\varepsilon )^{i-1}}\right) . \end{aligned}$$

Furthermore, the construction is local: If \(\mathcal {H}\) has an output (resp., input) locality of \(d=O(1)\), then the new family \(\mathcal {H}^t\) has an output (resp., input) locality of \(d^t=O(1)\).

A proof can be found in [23] (see also [9]).

5.2 Hardness Amplification

We move on to amplify the hardness parameter \(\delta \) from constant to negligible at the expense of slightly increasing the distance parameter. Our construction is based on a simple direct product.Footnote 8

Lemma 5.4

(Hardness Amplification) Let \(\mathcal {H}=\left\{ h_k:\{0,1\}^n\rightarrow \{0,1\}^{\varepsilon n}\right\} \) be \(\delta \)-secure \(\beta \)-TCR with key sampler \(\mathcal {K}\). Then, for every polynomial \(t=t(n)\) and every \(\gamma >\delta \), the t-direct product collection \(\mathcal {H}'\) defined via

$$\begin{aligned} h'_{(k_1,\ldots ,k_{t})}:(x_1,\ldots ,x_t) \mapsto \left( h_{k_1}(x_1),\ldots ,h_{k_{t}}(x_t)\right) , \quad \text {where } x_i\in \{0,1\}^n, k_i\mathop {\leftarrow }\limits ^{R}\mathcal {K}(1^n), \end{aligned}$$

is \((2^{-t(\gamma -\delta )^2}+{\mathrm {neg}}(n))\)-secure \((\beta +\gamma )\)-TCR with the same shrinkage factor, output locality, input locality and algebraic degree as \(\mathcal {H}\).

By taking \(t=n\) and letting \(\gamma \) be a constant which is strictly larger than \(\delta \), we reduce the security error to negligible.

Proof

We will need the following simple observation: Consider a pair of stings \(\mathbf {x}=(x_1,\ldots ,x_t)\in (\{0,1\}^n)^t\) and \(\mathbf {y}=(y_1,\ldots ,y_t)\in (\{0,1\}^n)^t\) which are \((\beta +\gamma )\) far in Hamming distance. Then, by an averaging argument, it holds that \(\Delta (x_i,y_i)>\beta \) for at least \(\gamma \)-fraction of the i’s.

We can now prove the lemma. Let \(\mathcal {A}=(\mathcal {A}_1,\mathcal {A}_2)\) be an adversary that finds \((\beta +\gamma )\)-far collisions under \(\mathcal {H}'\) with probability \(\delta '\). Using the above observation it follows that

$$\begin{aligned} \delta '\le \mathop {\mathop {\Pr }\limits _{{\varvec{k}\mathop {\leftarrow }\limits ^{R}\mathcal {K}^{t}(1^n)}}}\limits _{{(\varvec{x},R)\mathop {\leftarrow }\limits ^{R}\mathcal {A}_1(1^n)}}[\mathcal {A}_2(\varvec{k},\varvec{x},R)=\varvec{y} \text { s.t. } |\left\{ i: \Delta (x_i,y_i)>\beta \wedge h_{k_i}(x_i)=h_{k_i}(y_i)\right\} |\ge \gamma n], \end{aligned}$$

where \(\mathbf {k}=(k_1,\ldots ,k_t)\), \(\mathbf {x}=(x_1,\ldots ,x_t)\) and \(\mathbf {y}=(y_1,\ldots ,y_t)\). That is, given t(n) independent samples of \(\mathcal {H}\), the adversary \(\mathcal {A}\) finds \(\beta \)-far collisions on \(\gamma \) fraction of them with probability \(\delta '\). A general threshold direct product theorem of Impagliazzo and Kabanets [18, Theorem 5.2] shows that, in this case, the advantage \(\delta '\) is upper bounded by \(2^{-t D_2(\gamma \Vert \delta )}+{\mathrm {neg}}(n)<2^{-t (\gamma -\delta )^2}+{\mathrm {neg}}(n)\). The lemma follows. \(\square \)

5.3 Reducing the Distance Parameter \(\beta \)

In this section we transform \(\beta \)-TCR to standard TCR (with some loss in hardness and shrinkage). Such a transformation can be easily obtained (non-locally) by encoding the input x via an error-correcting code. Here we provide a local alternative which employs low-density parity-check matrices (LDPC). Such matrices will also be used to amplify the hardness parameter \(\delta \) in the next section.

LDPC. In order to amplify the distance parameter \(\beta \) we will need sparse parity-check matrices of a good code. Let \(m<n\) be an integer. We say that a matrix \(M\in \mathbb {Z}_2^{m \times n}\) has a dual (relative) distance of \(\beta \in (0,1)\) if the Hamming weight of every nonzero codeword \(x\in \ker (M)=\left\{ x|Mx =0\right\} \) is larger than \(\beta n\). We say that an infinite sequence of \(m(n) \times n\) binary matrices \({\mathcal {M}}_{m(n) \times n}=\left\{ M_n\right\} _{n\in \mathbb {N}}\) is a low-density parity-check code with distance \(\beta \) if for every n the matrix \(M_n\) has dual distance of \(\beta \) and \(M_n\) is sparse in the sense that the number of ones in each row and each column is bounded by some absolute constant d which does not depend on n. To make our construction efficient we need an LDPC \({\mathcal {M}}_{m(n) \times n}\) whose nth member can be computed in \(\mathrm{poly}(n)\) time. Such a construction is given in [12].

Proposition 5.5

For every \(\varepsilon \in (0,1)\) there exists a sequence \({\mathcal {M}}_{\varepsilon n \times n}=\left\{ M_n\right\} _{n\in \mathbb {N}}\) of \(\beta (\varepsilon )\)-LDPC for some \(\beta =\varepsilon /\mathrm{polylog}(1/\varepsilon )\). Furthermore, there exists an efficient algorithm that given \(1^n\) outputs the matrix \(M_n\) in \(\mathrm{poly}(n)\)-time.

Proof

For every constant \(\varepsilon \), Theorem 7.1 of Capalbo et al. [12] provides an explicit (efficiently computable) family of unbalanced bipartite graphs \(\mathcal {G}\) with n “column” nodes and \(m=\varepsilon n\) “row” nodes with constant degree on each side, such that each set of at most \(\beta n\) column nodes has almost full expansion of \(0.99 d\beta n\) row nodes where d is the degree of the column nodes. Sipser and Spielman [27] showed that the adjacency matrix of such a graph is \(\beta \)-LDPC. \(\square \)

Lemma 5.6

(\(\beta \)-TCR to TCR) Let \(\varepsilon '<\varepsilon \) and let \({\mathcal {M}}_{\varepsilon 'n \times n}=\left\{ M_n\right\} \) be a \(\beta \)-LDPC. Let \(\mathcal {H}=\left\{ h_k\right\} \) be \(\delta \)-secure \(\beta \)-TCR with shrinkage factor of \(1-\varepsilon \) and key sampler \(\mathcal {K}\), and define

$$\begin{aligned} \mathcal {H}'=\left\{ h'_{k}\right\} \quad h'_{k}= (h_k(x),M_nx), \qquad \text {where } k\mathop {\leftarrow }\limits ^{R}(\mathcal {K}(1^n). \end{aligned}$$

Then, \(\mathcal {H}'\) is \(\delta \)-secure TCR with shrinkage factor of \(1-\varepsilon +\varepsilon '\). Furthermore, the transformation is local and the algebraic degree of \(\mathcal {H}'\) is the same as the degree of \(\mathcal {H}\).

Proof

First, observe that the above transformation is local since \({\mathcal {M}}\) is d-sparse for \(d=O(1)\). Specifically, both the input locality and the output locality grow by an additive factor of d. Moreover, since M is used as a linear operator the algebraic degree of \(\mathcal {H}'\) is equal to the algebraic degree of \(\mathcal {H}\). We move on to prove the security. Let \(\mathcal {A}_2\) be a TCR adversary that, given \((x,r)\mathop {\leftarrow }\limits ^{R}\mathcal {A}_1(1^n)\) and \(h'_{k}\mathop {\leftarrow }\limits ^{R}\mathcal {H}'\), finds a collision \(x'\) with x under \(h'_{k}\) with probability \(\delta _{\mathcal {A}}\). We claim that such a collision must be \(\beta \)-far and so, by our assumption, \(\delta _{\mathcal {A}}<\delta \). Indeed, by definition, \(h_{k}(x)=h_k(x')\) and, in addition, \(M_nx=M_n x'\). It follows that the difference vector \(x\oplus x'\) is a nonzero vector in the kernel of \(M_n\), and therefore, \(x\oplus x'\) has a relative weight of \(\beta \). The lemma follows. \(\square \)

Remark 5.7

(Using LDPC ensembles) Lemma 5.6 easily generalizes to the case where \({\mathcal {M}}_{\varepsilon 'n \times n}\) forms an ensemble of \(\beta \)-LDPC’s, i.e., there exists an efficient sampler that given \(1^n\) samples a sparse \((\varepsilon 'n \times n)\) binary matrix that, with probability \(1-\delta '\), has a dual distance of \(\beta \). In this case, we modify the key sampler of \(\mathcal {H}'\) to sample a matrix M from \({\mathcal {M}}_{\varepsilon 'n \times n}\) together with a key \(k\mathop {\leftarrow }\limits ^{R}\mathcal {K}(1^n)\) and let \(h'_{k,M}(x)=(h_k(x),M x)\). It is not hard to show that the resulting collection is \((\delta +\delta ')\)-secure TCR. While our (theoretical) results can be derived without this extension (based on the explicit LDPC’s from Proposition 5.5), the use of ensembles may be beneficial in terms of concrete efficiency. Indeed, most practical LDPC codes (e.g., based on random sparse matrices) yield efficiently samplable ensembles of LDPC’s.

5.4 Reducing the Input Locality

We next show how to reduce the input locality of a TCR with constant output locality and constant shrinkage factor.

Lemma 5.8

(Reducing Input Locality) Assume that there exists a TCR \(\mathcal {H}\) with output locality d and shrinkage factor \(\varepsilon \). Then, for every \(\alpha \in (0,1)\) there exists a TCR \(\mathcal {H}'\) with output locality d, input locality \(d/(\varepsilon \cdot \alpha )\) and shrinkage factor \(\varepsilon /(1-\alpha )\).

Proof

Let us assume, without loss of generality, that for every function \(h_k:\{0,1\}^n\rightarrow \{0,1\}^{\varepsilon n}\) in the collection \(\mathcal {H}\), the input variables \((x_1,\ldots ,x_n)\) are ordered according to their input locality. Namely, if \(x_i\) affects \(t_i\) outputs, then \(t_1\le t_2 \le \cdots \le t_n\).Footnote 9 We define \(h'_k:\{0,1\}^{n'}\rightarrow \{0,1\}^{n' \cdot \varepsilon /(1-\alpha )}\) by mapping an \(n'=n(1-\alpha )\)-bit string x to the value \(h_k(x,0^{\alpha n})\in \{0,1\}^{n' \cdot \varepsilon /(1-\alpha )}\). Let \(\mathcal {H}'\) denote the collection \(\left\{ h'_k: h_k\in \mathcal {H}\right\} \) equipped with the key sampler \(\mathcal {K}'(1^{n'})=\mathcal {K}(1^n)\) where \(\mathcal {K}\) is the key sampler of \(\mathcal {H}\).

Observe that for every fixed index k, the average input locality of \(h_k\) is at most cd, and therefore, by Markov’s inequality, the fraction of inputs whose locality is larger than \(cd/\alpha \) is at most \(\alpha \). It follows that the input locality of \(h'_k\) is at most \(cd/\alpha =O(1)\), as claimed.

In addition, it is not hard to prove that \(\mathcal {H}'\) is a TCR. Specifically, given a TCR adversary \(\mathcal {A}'=(\mathcal {A}'_1,\mathcal {A}'_2)\) which breaks \(\mathcal {H}'\) with success probability of \(\delta (n')\), we define an adversary \(\mathcal {A}=(\mathcal {A}_1,\mathcal {A}_2)\) which breaks \(\mathcal {H}\) with similar success probability \(\delta (n')=\delta (n(1-\alpha ))\). The target specifier \(\mathcal {A}_1(1^n)\) computes \((x',r)\mathop {\leftarrow }\limits ^{R}\mathcal {A}'_1(1^n)\) and outputs the target string \(x=(x',0^{\alpha n})\) and state information r. The collision finder \(\mathcal {A}_2(k,x,r)\) computes \(y'\mathop {\leftarrow }\limits ^{R}\mathcal {A}'_2(k,x',r)\) and outputs the string \(y=(y',0^{\alpha n})\). The claim follows by noting that for every index k if the pair \((x',y')\) forms a collision under \(h'_k\), then the padded pair (xy) forms a collision under \(h_k\). \(\square \)

5.5 Proof of Theorem 5.1

Let \(\varepsilon >0\) be the given shrinkage parameter. Let \(\varepsilon _{\mathrm {LDPC}}=\varepsilon /2\) and let \({\mathcal {M}}_{\varepsilon _{\mathrm {LDPC}} n \times n}\) be an efficient \(\beta _{\mathrm {LDPC}}\)-LDPC for some constant \(\beta _{\mathrm {LDPC}}>0\) whose existence is promised by Proposition 5.5. We will show how to obtain TCR with shrinkage \(1-\varepsilon /4\) from any \(\delta \)-secure \(\beta \)-RTCR \(\mathcal {H}\) with shrinkage factor \(1-\varepsilon \), where \(\beta + \delta < \beta _{\mathrm {LDPC}}\).

Start by transforming \(\mathcal {H}\) into \(\delta \)-secure \(\beta \)-TCR with shrinkage factor of \(1-\varepsilon \) via Claim 5.2. Then, amplify the security error to negligible by employing Lemma 5.4 with \(t=n\) and \(\gamma =\beta _{\mathrm {LDPC}}-\beta >\delta \). This yields a \({\mathrm {neg}}(n)\)-secure \(\beta _{\mathrm {LDPC}}\)-TCR with shrinkage factor of \(1-\varepsilon \). Next, apply distance amplification (Lemma 5.6) with \({\mathcal {M}}_{\varepsilon _{\mathrm {LDPC}} n \times n}\) and obtain a TCR with standard security, shrinkage factor of \(1-\varepsilon +\varepsilon _{\mathrm {LDPC}}=1-\varepsilon /2\) and constant output locality. Finally, reduce the input locality via Lemma 5.8 (instantiated with \(\alpha =\frac{\varepsilon }{4}\)) at the expense of increasing the shrinkage factor to \(1-\varepsilon /4\). Observe that all these transformations preserve the algebraic degree and so we derive the main part of the theorem. The “furthermore” part now follows immediately from Claim 5.3. \(\square \)

6 Putting It Together

In this section we combine the results of the previous sections and derive the main theorem and its applications.

6.1 Locally Computable UOWHFs

Theorem 6.1

There exist some universal constants \(\beta >0\) and \(0<\gamma <\frac{1}{2}\) such that for every \((\beta ,\gamma )\)-good predicate P the following holds. Assuming that \(\mathcal {F}_{P,n,c n}\) is 1 / c-pseudorandom or that \(\mathcal {F}_{P,n,c' n^3}\) is one-way for some constants \(c=c(P)\) and \(c'=c'(P)\), there exists a locally computable UOWHF \(\mathcal {H}\) with constant shrinkage factor. Moreover, the algebraic degree of \(\mathcal {H}\) is equal to the degree of the predicate P.

By Claim 5.3, one can further reduce the shrinkage factor to an arbitrary constant \(\varepsilon '\in (0,1)\) at the expense of increasing the output/input locality and degree to a larger constant.

Proof

Fix some \(\varepsilon >0\). By Theorem 5.1 it suffices to prove the existence of \(\delta \)-secure \(\beta \)-RTCR with shrinkage factor of \(1-\varepsilon \) and constant output locality, for some universal constant \(\beta \) and \(\delta \). Let \(\gamma \) be a constant for which Eq. 3 is satisfied with \(\varepsilon \). (e.g., for \(\varepsilon =0.3\) it suffices to let \(\gamma =0.46\).) Let P be a \((\beta ,\gamma )\)-good predicate. By Corollary 4.2, there exists a constant \(k=k(P)\) for which \(\mathcal {F}_{P,n,(1-\varepsilon )n}\) is \(\delta \)-secure \(\beta \)-RTCR assuming that \(\mathcal {F}_{P,n,kn}\) is \(\delta /3\)-pseudorandom. Taking \(c=\max (k, 3/\delta )\) completes the proof of the first part of the theorem. To prove the second (“one-wayness”) part, we employ Corollary 6.2 of [4] which asserts that for predicates with a sensitive coordinate (as in property 2 of Def. 3.2), 1 / c-pseudorandomness of \(\mathcal {F}_{P,n,cn}\) is implied by the one-wayness of \(\mathcal {F}_{P,n,c' n^3}\) for a constant \(c'\) which depends on the constant c and (the locality of) P. \(\square \)

We suggest instantiating the theorem with the predicate \(\mathsf {MST}_{d_1,d_2}\) defined in Eq. 7, which XORs together a \(d_1\)-ary XOR with a \(d_2\)-ary AND (over \(d_1+d_2\) distinct inputs). Recall that Lemma 3.3 guarantees that for sufficiently large odd \(d_1\) and every \(d_2\ge 2\) the predicate \(\mathsf {MST}_{d_1,d_2}\) satisfies the goodness condition needed in Theorem 6.1. Concretely, the results of [11] support the following assumption:

Assumption 6.2

For every \(d\ge 3\) the collection \(\mathcal {F}_{\mathsf {MST}_{d,2},n,c n}\) is 1 / c-pseudorandom for arbitrary large constant c.

In fact, based on our existing knowledge, it seems that the above assumption holds even for \(c=n^{\varepsilon }\) for some small constant \(\varepsilon \). Alternatively, one can start with one-wayness as captured by the following assumption.

Assumption 6.3

For all sufficiently large constants \(d_1\) and \(d_2\) the collection \(\mathcal {F}_{\mathsf {MST}_{d_1,d_2},n,c n^3}\) is one-way for arbitrary large constant c.

Again, based on known attacks, one may conjecture that a much stronger version of the assumption holds. Namely, that for every constant c and all sufficiently large constants \(d_1,d_2>d(c)\) the collection \(\mathcal {F}_{\mathsf {MST}_{d_1,d_2},n,n^c}\) is one-way.Footnote 10 We further mention that the latter conjecture is supported by the results of [13].

Combined with Theorem 6.1, any of the above assumptions implies the existence of locally computable UOWHF with constant shrinkage factor, and so Theorem 1.1 follows.

6.2 Optimizing the Output Locality

One can further optimize the output locality (while preserving constant input locality and linear shrinkage) via the AIK compiler [7].

Proposition 6.4

If there exists a UOWHF \(\mathcal {H}\) with constant shrinkage factor constant output locality and constant input locality, then there exists a UOWHF \(\hat{\mathcal {H}}\) with constant shrinkage factor, constant input locality, and output locality of 4. Moreover, if the algebraic degree of \(\mathcal {H}\) is 2 then the output locality of \(\hat{\mathcal {H}}\) is 3.

Proof

In [7] it is shown that, for some small (universal) constant c, any UOWHF family \(\mathcal {H}:\{0,1\}^n\rightarrow \{0,1\}^{m(n)}\) that each of its output bits is computable by an \(\mathbf {NC^1}\) circuit of size l(n) can be transformed into a UOWHF \(\hat{\mathcal {H}}:\{0,1\}^{n+m(n)\cdot l(n)^c} \rightarrow \{0,1\}^{m(n)+m(n)\cdot l(n)^c}\) with output locality 4. Moreover, in the special case where the degree of \(\mathcal {H}\) is 2 the output locality of \(\hat{\mathcal {H}}\) is 3. (Originally, these implications are proven for collision resistance hash function, though the proof easily generalizes to the case of UOWHF as well.)

Typically in [7], l(n) is super-constant and so the shrinkage \(n-m(n)\) of the resulting UOWHF \(\hat{\mathcal {H}}\) is only sublinear in its input length \(n+m(n)\cdot l(n)^c\). However, when \(\mathcal {H}\) has a constant output locality each output bit is computable by a constant size circuit and so \(l(n)=O(1)\). In this case, linear shrinkage is preserved, i.e., since \(n-m(n)=\Theta (n)\) and \(l(n)=O(1)\), the shrinkage of the resulting UOWHF which is \(n-m(n)=\Theta (n)\) is still linear in its input length \(n+O(n)+O(n)\cdot O(1)\). Finally, we note that, when \(\mathcal {H}\) enjoys constant output locality, the above transformation preserves constant input locality as well. \(\square \)

As observed by Goldreich [16] functions with output locality 2 are efficiently invertible (due to the easiness of 2-SAT). Therefore, one cannot hope for output locality smaller than 3. Indeed, such an optimal locality can be achieved based on Assumption 6.2.

Corollary 6.5

Under Assumption 6.2 there exists a UOWHF with output locality 3, constant input locality and constant shrinkage factor.

Proof

Since \(\mathsf {MST}_{d,2}\) is a degree 2 predicate, Theorem 6.1 yields a degree-2 locally computable UOWHF with constant shrinkage factor. The corollary now follows from Proposition 6.4. \(\square \)

6.3 Applications

As mentioned in the introduction locally computable UOWHFs with constant shrinkage factor also allows us to optimize the sequential complexity of cryptography. In the following we measure the time complexity T(n) of a collection \(\mathcal {H}\) of UOWHF, as the sum of the sampling time and the evaluation time. Namely, T(n) measures the time which takes to sample \(h\mathop {\leftarrow }\limits ^{R}\mathcal {H}\) and evaluate it on a given n-bit input x.

Proposition 6.6

(Fast UOWHFs and Signatures) Assume the existence of a UOWHF \(\mathcal {H}\) with constant shrinkage factor and constant output locality. Then:

  1. 1.

    For every constant \(\varepsilon >0\) there exists a UOWHF \(\hat{\mathcal {H}}:\{0,1\}^n\rightarrow \{0,1\}^{n^{\varepsilon }}\) with polynomial shrinkage factor which is computable in linear time in the RAM model. Furthermore, each function \(h\in \hat{\mathcal {H}}\) is described by a string of length \(O(n^{\varepsilon })\).

  2. 2.

    There exists a digital signature scheme whose time complexity (both for signing and for verifying) is linear in the message length in the RAM model.

Proof

The proof follows the outline of [19] and is given here for completeness. Fix \(\varepsilon >0\), and let \(\mathcal {H}=\left\{ h_k\right\} \) be the underlying locally computable UOWHF whose shrinkage factor is constant. Without loss of generality (by Claim 5.3), we may assume that the collection shrinks n-bit strings to (n / 2)-bit strings. Let us denote the RAM complexity of the key-sampling algorithm by \(O(n^c)\) for some constant \(c>0\) and assume, without loss of generality, that \(c>1/\varepsilon \). We further assume that the key of the collection k is simply a canonic description of the (linear-size) \(\mathbf {NC^0}\) circuit which computes the function \(h_k\). (This can be always guaranteed by learning a canonic description of \(h_z\)—see [3, Proposition 2.4.1].) Observe that, given k and x, the value of \(h_k(x)\) can be computed in linear time by a RAM machine.

To reduce the time complexity of the sampling procedure (as well as the length of the key) we use direct product collection \(\mathcal {H}'=\left\{ h'_k\right\} \), defined by a single key \(k\mathop {\leftarrow }\limits ^{R}\mathcal {K}(1^{n^{1/c}})\) and \(h'_k(x)=(h_k(x^{1}),\ldots ,h_k(x^{t}))\) where \(x\in \{0,1\}^n\) is partitioned to \(t=n^{1-1/c}\) blocks each of size \(n^{1/c}\). It is not hard to verify that the resulting collection is still a UOWHF with shrinkage factor 2, and that the total RAM complexity of sampling a key and evaluating the function is \(t(n)=O(n)\). Furthermore, the description length of the key is \(n^{1/c}\).

As in Claim 5.3, we can amplify the shrinkage by composing \((1-\varepsilon )\log n\) functions from \(\mathcal {H}\) where the ith function \(h_{k_i}\) shrinks \(n/2^i\) bits to \(n/2^{i+1}\) bits and \(k_i\mathop {\leftarrow }\limits ^{R}\mathcal {K}(1^{n/2^i})\). The resulting collection \(\hat{\mathcal {H}}\) is a UOWHF (see [23]) which shrinks n bits to \(n^{\varepsilon }\) bits. The RAM complexity of the ith level is \(t(n/2^i)\), and so the overall complexity is \(T(n)=\sum _{i=0}^{(1-\varepsilon )\log n} t(n/2^i)=O(n)\), the description of the key is of length \(O(n^{\varepsilon })\). This completes the proof of the first item.

We move on to the proof of the second item. Let (GSV) be a standard signature scheme (whose existence follows from the existence of UOWHF [23]). Assume that the complexity of verification and signature is \(O(n^b)\) for some constant \(b>0\). We define a new signature scheme \((G,S',V')\) by employing the Naor–Yung transformation instantiated with the aforementioned linear time computable UOWHF \(\mathcal {H}:\{0,1\}^n\rightarrow \{0,1\}^{n^{\varepsilon }}\) whose keys are of length \(O(n^{\varepsilon })\) where \(\varepsilon =1/b\). Namely, to sign an n-bit message m, the new signing algorithm \(S'_{\mathsf {sk}}(m)\) samples a key k for \(\mathcal {H}\) and outputs \((k,S_{\mathsf {sk}}(k,h_k(m))\). To verify whether a tag \((k,\beta )\) is a valid signature of a document \(m\in \{0,1\}^n\) use \(V_{\mathsf {pk}}\) to check whether \(\beta \) is a valid signature of \(h_k(m)\) under the original scheme. The overall complexity in both cases is O(n) (for the hashing) plus \(O(n^{b\varepsilon })=O(n)\) (for applying the original signing/verifying algorithm on an input of length \(O(n^{\varepsilon })\)). This completes the proof of the second item. \(\square \)

We mention that [19] construct signatures which are linear time computable in the (stronger) circuit model, at the expense of using a stronger assumption (namely, that random local functions are exponentially one-way.)