1 Introduction

The common denominator of password hashing (e.g., as in PKCS#5 [13]) and proofs of work [7, 12] is the requirement for a certain computation to be sufficiently expensive, while still remaining feasible. In this context, “expensive” has traditionally meant high time complexity, but recent hardware advances have shown this requirement to be too weak, with fairly inexpensive tailored-made ASIC devices for Bitcoin mining and password cracking gaining increasingly widespread usage.

In view of this, a much better requirement is memory-hardness, i.e., the product of the memory (a.k.a. space) and the time required to solve the task at hand (this is known as the space-time (ST) complexity) should be large. The ST complexity is widely considered to be a good estimate of the product of the area and the time (AT) complexity of a circuit solving the task [3, 5, 16], and thus increasing ST complexity appears to incur a higher dollar cost for building custom circuits compared to simply increasing the required raw computing power alone. Motivated by this observation, Percival [16] developed scrypt, a candidate memory-hard function for password hashing and key derivation which has been well received in practice (e.g., it underlies the Proof of Work protocols of LiteCoin [14], one of the currently most prevalent cryptocurrencies in terms of market capitalization [1]). This has made memory-hardness one of the main desiderata in candidates for the recent password-hashing competition, including its winner, Argon2 [4]. Dziembowski et al. [9] introduce the concept of proofs of space (PoSpace), where the worker (or miner) can either dedicate a large amount of storage space, and then generate proofs extremely efficiently, or otherwise must pay a large time cost for every proof generated. The PoSpace protocol has also found its way into a recent proposal for digital currency [15].

Our contributions, in a nutshell. Cryptanalytic attacks [3, 5, 6, 17] targeting candidate memory-hard functions [2, 4, 11, 17] have motivated the need for developing constructions with provable security guarantees. With the exception of [3], most candidate memory-hard functions come without security proofs and those that do (e.g. [11, 16, 17]) only consider a severely restricted class of algorithms and complexity notions, as we discuss below. A primary goal of this paper is to advance the foundations of memory-hardness, and we make progress along several fronts.

We develop a new class of probabilistic pebbling games on graphs – called entangled pebbling games – which are used to prove results on the memory-hardness of tasks such as computing scrypt for large non-trivial classes of adversaries. Moreover, we show how to boost these results to hold against arbitrary adversaries in the parallel random oracle model (pROM) [3] under the conjecture that a new combinatorial quantity which we introduce is (sufficiently) bounded.

A second application of the techniques introduced in this paper considers Proofs of Space. We show that time lower bounds on the pebbling complexity of a graph imply time lower bounds in the pROM model agains any adversary. The quantitative bounds we get depend on the combinatorial value we introduce, and assuming our conjecture, are basically tight. This solves, modulo the conjecture, the main problem left open in the Proofs of Space paper [9].

Sequentially memory-hard functions. Recall that scrypt Footnote 1 uses a hash function \(\mathsf{h}: \{0,1\}^* \rightarrow \{0,1\}^w\) (e.g., SHA-256), and proceeds in two phases, given an input X. It first computes \(X_i = \mathsf{h}^{i}(X)\) for all \(i \in [n]\), and with \(S_0 = X_n\), it then computes \(S_1, \ldots , S_n\) where

$$\begin{aligned} S_i = \mathsf{h}(S_{i-1} \oplus X_{\mathsf {int}(S_{i-1})}) \end{aligned}$$

where \(\mathsf {int}(S)\) reduces an w-bit string S to an integer in [n]. The final output is \(S_n\). Note that is possible to evaluate scrypt on input X using \(n \cdot w\) bits of memory and in time linear in n, by keeping the values \(X_1, \ldots , X_n\) stored in memory once they are computed. However, the crucial point is that there is no apparent way to save memory – for example, to compute \(S_{i}\), we need to know \(X_{\mathsf {int}(S_{i-1})}\), and under the assumption that \(\mathsf {int}(S_{i-1})\) is (roughly) uniformly random in [n], an evaluator without memory needs to do linear work (in n) to recover this value before continuing with the execution. This gives a constant-memory, \(O(n^2)\) time algorithm to evaluate scrypt. In fact, as stated by Percival [16], the actual hope is that no matter how much time T(n) and how much memory S(n) an adversarial evaluator invests, we always have \(S(n) \cdot T(n) \ge n^{2 - \epsilon }\) for all \(\epsilon > 0\), even if the evaluator can parallelize its computation arbitrarily.

Percival’s analysis of scrypt assumes that \(\mathsf{h}\) is a random oracle. The analysis is limited in two ways: (1) It only considers adversaries which can only store random oracle outputs in their memory. (2) The bound measures memory complexity in terms of the maximum memory resources S(n). The latter is undesirable, since the ultimate goal of an adversary performing a brute-force attack is to evaluate scrypt on as many inputs as possible, and if the large memory usage is limited to a small fraction of the computing time, a much higher amortized complexity can be achieved.

Alwen and Serbinenko (AS) [3] recently addressed these shortcomings, and delivered provably sequentially memory-hard functions in the so-called parallel random oracle model (pROM), developing new and better complexity metrics tailored to capturing amortized hardness. While their work falls short of delivering guarantees for scrypt-like functions, it serves as an important starting point for our work, and we give a brief overview.

From sequential memory-hardness to pebbling. AS consider adversaries attempting to evaluate a function \(\mathcal {H}^\mathsf{h}\) (which makes calls to some underlying hash function \(\mathsf{h}\), modeled as a random oracle). These adversaries proceed in rounds: in each round i, the adversary can make an unbounded number of parallel queries to \(\mathsf{h}\), and then pass on a state \(\sigma _i\) to the next round. The complexity of the adversary is captured by its cumulative memory complexity (CMC) given by \(\sum _{i} |\sigma _i|\). One then denotes as \(\mathsf{cmc}^{\texttt {pROM}}(\mathcal {H})\) the expected CMC of the best adversary where the expectation is over the choice of RO \(\mathsf{h}\) and coins of the adversary. We stress that CMC exhibits some very important features: First, a lower bound appears to yield a reasonable lower bound on the AT complexity metric. Second, In contrast to the ST complexity the CMC of a task also gives us a lower-bound on the electricity consumption of performing the task. This is because storing data in volatile memory for, say, the time it takes to evaluate \(\mathsf{h}\) consumes a significant amount of electricity. Thus CMC tells us something not only about the dollar cost of building a custom circuit for computing a task but also about the dollar cost of actually running it. While the former can be amortized over the life of the device, the later represents a recurring fee.

AS study sequentially memory-hard functions naturally defined by a single-source and single-sink directed acyclic graph (DAG) \(G = (V, E)\). The label of a vertex \(i\in V\) with parents \(\{p_1,\ldots ,p_d\}\) (i.e., \((p_j,v)\in E\) for \(i=1,\ldots ,d\)) is defined as \( \ell _i=\mathsf{h}(i,\ell _{p_1},\ldots ,\ell _{p_d}). \) Note that the labels of all vertices can be recursively computed starting with the sources. The function \(\mathsf{label}(G,\mathsf{h})\) is now simply the label \(\ell _v\) of the sink v. There is a natural connection between \(\mathsf{cmc}^{\texttt {pROM}}(\mathsf{label}(G,\mathsf{h}))\) for a randomly chosen \(\mathsf{h}\) and the cumulative pebbling complexity (CC) of the graph G.Footnote 2 CC is defined in a game where one can place pebbles on the vertices of V, according to the following rules: In every step of the game, new pebbles can be placed on any vertex for which all parents of v have pebbles on them (in particular, pebbles can always be placed on sources), and pebbles can always be removed. The game is won when a pebble has been placed on the sink. The CC of a strategy for pebbling G is defined as \(\sum _{i} |S_i|\), where \(S_i\) is the set of vertices on which a pebble is placed at the end of the \(i^{th}\) step, and the CC of G – denoted \(\textsf {cc}(G)\) – is the CC of the best strategy.

Indeed, \(\textsf {cc}(G)\) captures the CMC of restricted pROM adversaries computing \(\mathsf{label}(G,\mathsf{h})\) for which every state \(\sigma _i\) only consists of random oracle outputs, i.e., of vertex labels. A pebble on v is equivalent to the fact that \(\sigma _i\) contains \(\ell _v\). However, a full-fledged pROM adversary has no reason to be restricted to such a strategy – it could for example store as part of its state \(\sigma _i\) a particular encoding of the information accumulated so far. Nonetheless, AS show that (up to a negligible extent) such additional freedom does not help in computing \(\mathsf{label}(G,\mathsf{h})\). They complement this with an efficiently constructible class of constant-degree DAGs \(G_n\) on n vertices such that \(\textsf {cc}(G_n) = \varOmega (n^2/\mathrm {polylog}(n))\).

Unfortunately however, the framework of [3] does not extend to functions like scrypt, as they are data dependent, i.e., the values which need to be input to \(\mathsf{h}\) are determined at run-time. While this makes the design far more intuitive, AS’s techniques crucially rely on the relationship between intermediate values in the computation being laid out a priori in a data-independent fashion.

Our contributions. This paper validates the security of scrypt-like functions with two types of results – results for restricted adversaries, as well as results for arbitrary adversaries under a combinatorial conjecture. Our results also have direct implications on proofs of space, but we postpone this discussion to ease presentation.

(1) Probabilistic pebbling games. We introduce a generalization \(\mathsf{pebble}\) of pebbling games on a DAG \(G = (V, E)\) with dynamic challenges uniformly sampled from a set \(C \subseteq V\). With the same pebbling rules as before, we now proceed over n rounds, and at every round, a challenge \(c_i\) is drawn uniformly at random from C. The player’s goal is to place a pebble on \(c_i\), before moving to the next round, and learning the next challenge \(c_{i+1}\). The game terminates when the last challenge has been covered by a pebble. One can similarly associate with G a labeling game \(\mathsf{computeLabel}\) in the pROM, where the goal is instead to compute the label \(\ell _{c_i}\) of \(c_i\), rather than placing a pebble on it. For instance, the computation of scrypt is tightly connected to the \(\mathsf{computeLabel}\) played on the line graph \(L_n\) with vertices \([n] = \{1, 2, \ldots , n\}\), edges \(\{(i, i + 1) : i \in [n - 1]\}\), and challenges \(C = [n]\) (as detailed in Sect. 2.5). The labels to be computed in this game are those needed to advance the computation in the second half of the scrypt computation, and the challenges (in the actual scrypt function) are computed from hash-function outputs.

In fact, it is not hard to see that in \(\mathsf{computeLabel}\) for some graph G a pROM adversary that only stores random-oracle generated outputs can easily be turned into a player for the \(\mathsf{pebble}\) for graph G. This is particular true for \(G = L_n\), and thus lower bounding the CC of an adversary playing \(\mathsf{pebble}\) on \(L_n\) also yields a lower bound on the CMC of computing (the second half of) scrypt. Our first result provides such a lower bound.

Theorem 1. For any constant \(\delta > 0\), the CC of an adversary playing \(\mathsf{pebble}\) on the line graph \(L_n\) with challenges [n] is \(\varOmega _\delta (n^{2}/\log ^2(n))\) with probability \(1 - \delta \) over the choice of all challenges.Footnote 3

To appreciate this result, it should be noted that it inherently relies on the choice of the challenges being independent of the adversary playing the game – indeed, if the challenges are known a priori, techniques from [3] directly give a strategy with CC \(O(n^{1.5})\) for the above game. Also this result already improves on Percival’s analysis (which, implicitly, places similar restrictions on class of pROM algorithms considered), as Theorem 1 uses the CC of the (simple) pebbling of a graph, and thus it actually generalized to a lower bound on the amortized complexity of computing multiple scrypt instances in the pROM.Footnote 4

(2) Entangled pebbling. The above result is an important first step – to the best of our knowledge all known evaluation attacks against memory-hard functions indeed only store hash labels directly or not at all and thus fit into this model – but we ask the question whether the model can be strengthened. For example, an adversary could store the XOR \(\ell _i\oplus \ell _j\) of two labels (which only takes w bits) and depending on possible futures of the game, recover both labels given any one of them. As we will see, this can help. As a middle ground between capturing pROM security for arbitrary adversaries and the above pebbling adversaries, we introduce a new class of pebbling games, called entanglement pebbling games, which constitutes a combinatorial abstraction for such adversaries.

In such games, an adversary can place on a set \(\mathcal{Y}\subseteq V\) an “entangled pebble” \({\langle \mathcal{Y} \rangle _{t}}\) for some integer \(0 \le t \le \left| \mathcal{Y} \right| \). The understanding here is that placing an individual pebble on any t vertices \(v \in \mathcal{Y}\) – which we see as a special case of \({\langle v \rangle _{0}}\) entangled pebble – is equivalent to having individual pebbles on all vertices in \(\mathcal{Y}\). The key point is that keeping an entangled pebble \({\langle \mathcal{Y} \rangle _{t}}\) costs only \({|}{\mathcal{Y}}{|} - t\), and depending on challenges, we may take different choices as to which t pebbles we use to “disentangle” \({\langle \mathcal{Y} \rangle _{t}}\). Also, note that in order to create such an entangled pebble, on all elements of \(\mathcal{Y}\) there must be either an individual pebble, or such pebble can easily be obtained by disentangling existing entangled pebbles.

In the pROM labeling game, an entangled pebble \({\langle \mathcal{Y} \rangle _{t}}\) corresponds to an encoding of length \(w\cdot (|\mathcal{Y}|-t)\) of the w-bit labels \(\{\ell _{i}\ :\ i\in \mathcal{Y}\}\) such that given any t of those labels, we can recover all the remaining ones. Such an encoding can be obtained as follows: Fix \(2d - t\) elements \(x_1, \ldots , x_{2d - t}\) in the finite field \(\mathbb {F}_{2^w}\). Let \(\mathcal{Y}=\{y_1,\ldots ,y_d\}\), and consider the (unique) degree \(d-1\) polynomial p(.) over the finite field \(\mathbb {F}_{2^w}\) (whose element are represented as w-bit strings) such that

$$ \forall i\in [d]\ :\ p(x_i)=\ell _{y_i}. $$

The encoding now simply contains \(\{p(x_{d+1}),\ldots ,p(x_{2d-t})\}\), i.e., the evaluation of this polynomial on \(d-t\) points. Note that given this encoding and any t labels \(\ell _i,i\in \mathcal{Y}\), we have the evaluation of p(.) on d points, and thus can reconstruct p(.). Once we know p(.), we can compute all the labels \(\ell _{y_i}=p(i)\) in \(\mathcal{Y}\).

In general, we prove (in the full version) that entangled pebbling is strictly more powerful (in terms of minimizing the expected CC) than regular pebbling. Fortunately, we will also show that for the probabilistic pebbling game on the line graph \(L_n\) entangled pebbling cannot outperform regular ones.

Theorem 2. For any constant \(\delta > 0\), the CC of an entangled pebbling adversary playing \(\mathsf{pebble}\) on graph \(L_n\) is \(\varOmega _\delta (n^{2}/\log ^2(n))\) with probability \(1 - \delta \) over the choice of all challenges.

Interestingly, the proof is a simple adaptation of the proof of for the non-entangled case. This result can again be interpreted as providing a guarantee in the label game in the pROM for \(L_n\) for the class of adversaries that can be abstracted by entangled pebbling strategies.

(3) Arbitrary Adversaries. So far we have only discussed (entangled) pebbling lower bounds, which then imply lower bounds for restricted adversaries in the pROM model. In Sect. 4 we consider security against arbitrary adversaries. Our main results there show that there is a tight connection between the complexity of playing \(\mathsf{computeLabel}\) and a combinatorial quantity \(\gamma _n\) that we introduce. We show two results. The first lower-bounds the time complexity of playing \(\mathsf{computeLabel}\) for any graph G while the second lower-bounds the CMC of playing \(\mathsf{computeLabel}\) for \(L_n\) (and thus scrypt).

  1. 1.

    For any DAG \(G = (V,E)\) with \(|V| = n\), with high probability over the choice of the random hash function \(\mathsf{h}\), the pROM time complexity to play \(\mathsf{computeLabel}\) for graph G, for any number of challenges, using \(\mathsf{h}\) and when starting with any state of size \(k \cdot w\) is (roughly) at least the time complexity needed to play \(\mathsf{pebble}\) on G with the same number of challenges and starting with an initial pebbling of size roughly \(\gamma _n \cdot k\).

  2. 2.

    The pROM CMC for \(\mathsf{pebble}\) for \(L_n\) is \(\varOmega (n^{2}/\log ^2(n) \cdot \gamma _n)\).

At this point, we do not have any non-trivial upper bound on \(\gamma _n\) but we conjecture that \(\gamma _n\) grows very small (if at all) as a function of n. The best lower bound we have is \(\gamma _5 > 3/2\). Note that \(\gamma \) does not need to be constant in n – we would get non-trivial statements even if \(\gamma _n\) were to grow moderately as a function of n, i.e. \(\gamma _n = \mathrm {polylog}(n)\) or \(\gamma _n = n^{\epsilon }\) for some small \(\epsilon > 0\).

Therefore, assuming our conjecture on \(\gamma _n\), the first result in fact solves the main open problem from the work of Dziembowski et al. [9] on proofs of space. The second result yields, in particular, a near-quadratic lower bound on the CMC of evaluating scrypt for arbitrary pROM adversaries.

2 Pebbling, Entanglement, and the pROM

In this section, we first present both a notion of parallel pebbling of graphs with probabilistic challenges, and then extend this to our new notion of entangled pebbling games. Next, we discuss some generic relations between entangled and regular pebbling, before finally turning to defining the parallel random-oracle model (pROM), and associated complexity metrics.

Throughout, we use the following notation for common sets \({\mathbb {N}}:=\{0,1,2,\ldots \}\), \({\mathbb {N}^{+}}:= {\mathbb {N}}\setminus \{0\}\), \({\mathbb {N}}_{\le c} := \{0,1,\ldots ,c\}\) and \([c]:=\{1,2,\ldots ,c\}\). For a distribution \({{\mathcal {D}}}\) we write \(x\in {{\mathcal {D}}}\) to denote sampling x according to \({{\mathcal {D}}}\) in a random experiment.

Fig. 1.
figure 1

Description of the m-round, probabilistic parallel pebbling game

2.1 Probabilistic Graph Pebbling

Throughout, let \(G=(V,E)\) denote a directed acyclic graph (DAG) with vertex set \(V=[n]\). For a vertex \(i \in V\), we denote by \(\mathsf{parent}(i)=\{j\in V\ :\ (j,i)\in E\}\) the parents of i. The m-round, probabilistic parallel pebbling game between a player \(\mathtt {T}\) on a graph \(G=(V,E)\) with challenge nodes \(C\subseteq V\) is defined in Fig. 1. The cumulative black pebbling complexity is defined as

$$\begin{aligned} \begin{aligned} \textsf {cc}(G, C, m, \mathtt {T}, {P_\mathsf{init}})&:= \underset{\mathsf{pebble}(G,C,m,\mathtt {T},{P_\mathsf{init}})}{\mathbb {E}}{\left[ \mathsf{cost}\right] } \\ \textsf {cc}(G, C, m, k)&:= \min _{\underset{|{P_\mathsf{init}}|\le k}{\mathtt {T},{P_\mathsf{init}}\subseteq V}}\left\{ \textsf {cc}(G, C, m, \mathtt {T}, {P_\mathsf{init}}) \right\} \end{aligned} \end{aligned}$$

Similarly, the time cost is defined as

$$\begin{aligned} \begin{aligned} \mathsf{time}(G,C,m,\mathtt {T},{P_\mathsf{init}})&:= \underset{\mathsf{pebble}(G,C,m,\mathtt {T},{P_\mathsf{init}})}{\mathbb {E}}\left[ {\mathsf{cnt}}\right] \\ \mathsf{time}(G,C,m,k)&:= \min _{\underset{|{P_\mathsf{init}}|\le k}{\mathtt {T},{P_\mathsf{init}}\subseteq V}}\left\{ \mathsf{time}(G, C, m, \mathtt {T}, {P_\mathsf{init}}) \right\} \end{aligned} \end{aligned}$$

The above notions consider the expected cost of a pebbling, thus even if, say \(\textsf {cc}(G,C,m,k)\), is very large, this could be due to the fact that for a tiny fraction of challenge sequences the complexity is very high, while for all other sequences it is very low. To get more robust security notions, we will define a more fine-grained notion which will guarantee that the complexity is high on all but some \(\epsilon \) fraction on the runs.

$$\begin{aligned} \textsf {cc}_{\epsilon }(G,C,m,\mathtt {T},{P_\mathsf{init}}):= & {} \inf \left\{ \gamma \ ~\left| ~ \underset{\mathsf{pebble}(G,C,m,\mathtt {T},{P_\mathsf{init}})}{\mathbb {P}}\left[ {\mathsf{cost}}\ge \gamma \right] \ge 1-{\epsilon }\right\} \right. \\ \textsf {cc}_{\epsilon }(G,C,m,k):= & {} \min _{\underset{|{P_\mathsf{init}}|\le k}{\mathtt {T},{P_\mathsf{init}}\subseteq V}} \left\{ \textsf {cc}_{\epsilon }(G,C,m,\mathtt {T},{P_\mathsf{init}}\right\} \\ \mathsf{time}_{\epsilon }(G,C,m,\mathtt {T},{P_\mathsf{init}}):= & {} \inf \left\{ \gamma \ ~\left| ~ \underset{\mathsf{pebble}(G,C,m,\mathtt {T},{P_\mathsf{init}})}{\mathbb {P}}\left[ {\mathsf{cnt}}\ge \gamma \right] \ge 1-{\epsilon }\right\} \right. \\ \mathsf{time}_{\epsilon }(G,C,m,k):= & {} \min _{\underset{|{P_\mathsf{init}}|\le k}{\mathtt {T},{P_\mathsf{init}}\subseteq V}} \left\{ \mathsf{time}_{\epsilon }(G,C,m,\mathtt {T},{P_\mathsf{init}}\right\} \end{aligned}$$

In general, we cannot upper bound cc in terms of \(\textsf {cc}_{\epsilon }\) if \(\epsilon >0\) (same for \(\mathsf{time}\) in terms of \(\mathsf{time}_\epsilon \)), but in the other direction it is easy to show that

$$ \textsf {cc}(G,C,m,\mathtt {T},{P_\mathsf{init}})\ge \textsf {cc}_{\epsilon }(G,C,m,\mathtt {T},{P_\mathsf{init}})(1-\epsilon ) $$

2.2 Entangled Graph Pebbling

In the above pebbling game, a node is always either pebbled or not and there is only one type of pebble which we will hence forth refer to as a “black” pebble. We will now introduce a more general game, where \(\mathtt {T}\) can put “entangled” pebbles.

A t-entangled pebble, denoted \({\langle \mathcal{Y} \rangle _{t}}\), is defined by a subset of nodes \(\mathcal{Y}\subseteq ~[n]\) together with an integer \(t\in {\mathbb {N}}_{\le |\mathcal{Y}|}\). Having black pebble on all nodes \(\mathcal{Y}\) now corresponds to the special case \({\langle \mathcal{Y} \rangle _{0}}\). Entangled pebbles \({\langle \mathcal{Y} \rangle _{t}}\) now have the following behaviour. Once any subset of \(\mathcal{Y}\) of size (at least) t contains black pebbles then all \(v\in \mathcal{Y}\) immediatly receive a black pebble (regardless of whether their parents already contained black pebbles or not). We define the weight of an entangled pebble as:

$${|{\langle \mathcal{Y} \rangle _{t}}|_{{\updownarrow }}}:=|\mathcal{Y}|-t.$$

More generally, an (entangled) pebbling configuration is defined as a set \(P=\{{\langle \mathcal{Y}_1 \rangle _{t_1}},\ldots ,{\langle \mathcal{Y}_z \rangle _{t_s}}\}\) of entangled pebbles and its weight is

$${|P|_{{\updownarrow }}}:= \sum _{i\in [s]} {|{\langle \mathcal{Y}_i \rangle _{t_i}}|_{{\updownarrow }}}.$$

The rule governing how a pebbling configuration \(P_\mathsf{cnt}\) can be updated to configuration \(P_{\mathsf{cnt}+1}\) – which previously was the simple property eq.(1) – are now a bit more involved. To describe them formally we need the following definition.

Definition 1

(Closure). The closure of an entangled pebbling configuration \(P=\{{\langle \mathcal{Y}_1 \rangle _{t_1}}, \ldots ,{\langle \mathcal{Y}_s \rangle _{t_s}}\}\) – denoted \(\mathsf{closure}(S)\) – is defined recursively as follows: initialise \(\varLambda =\emptyset \) and then

$$\text {while}\ \exists j\in [s]\ :\ (\mathcal{Y}_j \not \subseteq \varLambda ) \wedge (\varLambda \cap \mathcal{Y}_j\ge t_j)\ \text {set}\ \varLambda :=\varLambda \cup \mathcal{Y}_j$$

once \(\varLambda \) cannot be further extended using the rule above we define \(\mathsf{closure}(S)=\varLambda \).

Note that \(\mathsf{closure}(S)\) is non-empty iff there’s at least one set of t-entangled pebbles \({\langle \mathcal{Y} \rangle _{t}}\) in P with \(t=0\). Equipped with this notion we can now specify how a given pebbling configuration can be updated.

Definition 2

(Valid Update). Let \(P=\{{\langle \mathcal{Y}_1 \rangle _{t_1}},\ldots ,{\langle \mathcal{Y}_m \rangle _{t_s}}\}\) be an entangled pebbling configuration. Further,

  • Let \({\mathcal{V}_1}:=\mathsf{closure}(P)\).

  • Let \({\mathcal{V}_2}:=\{i \ :\ \mathsf{parent}(i)\subseteq \mathcal{V}_1\}\). These are the nodes that can be pebbled using the black pebbling rules (Eq. 1).

Now \(P' = \{{\langle \mathcal{Y}'_1 \rangle _{t'_1}},\ldots ,{\langle \mathcal{Y}'_{s'} \rangle _{t'_{s'}}}\}\) is a valid update of P if for every \({\langle \mathcal{Y}'_{j'} \rangle _{t'_{j'}}}\) one of the two conditions is satisfied

  1. 1.

    \(\mathcal{Y}'_{j'}\subseteq (\mathcal{V}_1\cup \mathcal{V}_2)\).

  2. 2.

    \(\exists i\) with \(\mathcal{Y}'_{j'}=\mathcal{Y}_i\) and \(t'_j\ge t_i\). That is, \({\langle \mathcal{Y}'_{j'} \rangle _{t'_{j'}}}\) is an entangled pebble \({\langle \mathcal{Y}_i \rangle _{t_i}}\) that is already in P, but where we potentially have increased the threshold from \(t_i\) to \(t'_{j'}\).

Fig. 2.
figure 2

The entangled pebbling game \(\mathsf{pebble^\updownarrow }(G,C,m,\mathtt {T})\).

The entangled pebbling game \(\mathsf{pebble^\updownarrow }(G,C,m,\mathtt {T})\) is now defined like the game \(\mathsf{pebble}(G,C,m,\mathtt {T})\) above, except that \(\mathtt {T}\) is allowed to choose entangled pebblings. We give it in Fig. 2. The cumulative entangled pebbling complexity and the entangled time complexity of this game are defined analogously to those of the simple pebbling game – we just replace cc with \(\mathsf{cc}^\updownarrow \) and \(\mathsf{time}\) with \(\mathsf{time}^\updownarrow \) in our notation to account for entanglement being considered. In the full version, we show that entanglement can indeed improve the cumulative complexity with respect to unentangled pebbling. However, in the next section, we will show that this is not true with respect to time complexity.

2.3 Entanglement Does Not Improve Time Complexity

We show that in terms of time complexity, entangled pebbling are no more efficient than normal pebbles.

Lemma 3

(Entangled Time \(=\) Simple Time). For any \(G, C, m, \mathtt {T}^\updownarrow , {P_\mathsf{init}}^\updownarrow \) and \(\epsilon \ge 0\) there exist a \(\mathtt {T},{P_\mathsf{init}}\) such that \(|{P_\mathsf{init}}| \le {|{P_\mathsf{init}}^\updownarrow |_{{\updownarrow }}}\) and

$$\begin{aligned} \mathsf{time}(G,C,m,\mathtt {T},{P_\mathsf{init}})\le & {} \mathsf{time}^\updownarrow (G,C,m,\mathtt {T}^\updownarrow ,{P_\mathsf{init}}^\updownarrow )\end{aligned}$$
(2)
$$\begin{aligned} \mathsf{time}_\epsilon (G,C,m,\mathtt {T},{P_\mathsf{init}})\le & {} \mathsf{time}^\updownarrow _\epsilon (G,C,m,\mathtt {T}^\updownarrow ,{P_\mathsf{init}}^\updownarrow ) \end{aligned}$$
(3)

in particular

$$\begin{aligned} \mathsf{time}^\updownarrow (G,C,m,k)=\mathsf{time}(G,C,m,k) \qquad \mathsf{time}^\updownarrow _\epsilon (G,C,m,k)=\mathsf{time}_\epsilon (G,C,m,k) \end{aligned}$$
(4)

Proof

The \(\ge \) directions in Eq. (4) follows directly from the fact that a black pebbling is a special case of an entangled pebbling. The \(\le \) direction follows from Eqs. (2) and (3). Below we prove Eq. (2), the proof for Eq. (3) is almost analogous.

We say that a player \({\mathtt {A}}_\mathsf{greedy}\) for a normal or entangled pebbling is “greedy”, if its strategy is simply to pebble everything possible in every round and never remove pebbles. Clearly, \({\mathtt {A}}_\mathsf{greedy}\) is optimal for time complexity, i.e.,

$$\begin{aligned} \forall G,C,m,{P_\mathsf{init}}: & {} \min _\mathtt {T}\mathsf{time}(G,C,m,\mathtt {T},{P_\mathsf{init}})= \mathsf{time}(G,C,m,{\mathtt {A}}_\mathsf{greedy},{P_\mathsf{init}})\end{aligned}$$
(5)
$$\begin{aligned} \forall G,C,m,{P_\mathsf{init}}^\updownarrow: & {} \min _\mathtt {T}\mathsf{time}^\updownarrow (G,C,m,\mathtt {T},{P_\mathsf{init}}^\updownarrow )= \mathsf{time}^\updownarrow (G,C,m,{\mathtt {A}}_\mathsf{greedy},{P_\mathsf{init}}^\updownarrow ) \end{aligned}$$
(6)

We next describe how to derive an initial black pebbling \({P_\mathsf{init}}^*\) from an entangled pebbling \({P_\mathsf{init}}^\updownarrow \) of cost \(|{P_\mathsf{init}}^*|\le {|{P_\mathsf{init}}^\updownarrow |_{{\updownarrow }}}\) such that

$$\begin{aligned} \mathsf{time}(G,C,m,{\mathtt {A}}_\mathsf{greedy},{P_\mathsf{init}}^*)\le \mathsf{time}^\updownarrow (G,C,m,{\mathtt {A}}_\mathsf{greedy},{P_\mathsf{init}}^\updownarrow ) \end{aligned}$$
(7)

Note that this then proves Eq. (2) (with \({\mathtt {A}}_\mathsf{greedy},{P_\mathsf{init}}^*\) being \(\mathtt {T}, {P_\mathsf{init}}\) in the statement of the lemma) as

$$\begin{aligned} \mathsf{time}^\updownarrow (G,C,m,\mathtt {T}^\updownarrow ,{P_\mathsf{init}}^\updownarrow )\ge & {} \mathsf{time}^\updownarrow (G,C,m,{\mathtt {A}}_\mathsf{greedy},{P_\mathsf{init}}^\updownarrow )\end{aligned}$$
(8)
$$\begin{aligned}\ge & {} \mathsf{time}(G,C,m,{\mathtt {A}}_\mathsf{greedy},{P_\mathsf{init}}^*) \end{aligned}$$
(9)

It remains to prove Eq. (7). For every share \({\langle \mathcal{Y} \rangle _{t}}\in {P_\mathsf{init}}^\updownarrow \) we observe which \(|\mathcal{Y}|-t\) pebbles are the last ones to become availableFootnote 5 in the random experiment \(\mathsf{pebble^\updownarrow }(G,C,m,\mathtt {T}^\updownarrow ,{P_\mathsf{init}}^\updownarrow )\), and we add these pebbles to \({P_\mathsf{init}}\) if they’re not already in there.

Note that then \(|{P_\mathsf{init}}|\le {|{P_\mathsf{init}}^\updownarrow |_{{\updownarrow }}}\) as required. Moreover Eq. (7) holds as at any timestep, the nodes available in \(\mathsf{pebble^\updownarrow }(G,C,m,{\mathtt {A}}_\mathsf{greedy},{P_\mathsf{init}}^\updownarrow )\) are nodes already pebbled in \(\mathsf{pebble}(G,C,m,{\mathtt {A}}_\mathsf{greedy},{P_\mathsf{init}}^*)\) at the same timestep.    \(\square \)

2.4 The Parallel Random Oracle Model (pROM)

We turn to an analogue of the above pebbling games n the parallel random oracle model (pROM) [3]. In particular, let \(G=(V,E)\) be a DAG with a dedicated set \(C\subseteq V\) of challenge edges, we identify the vertices with \(V=[n]\). A labelling \(\ell _1,\ldots ,\ell _n\) of G’s verticies using a hash functiotn \(\mathsf{h}:\{0,1\}^{*}\rightarrow \{0,1\}^w\) is defined as follows. Let \(\mathsf{parent}(i)=\{j\in V:(j,i)\in E\}\) denote the parents of i, then

$$\begin{aligned} \ell _i=\mathsf{h}(i, \ell _{p_1},\ldots ,\ell _{p_d})\quad \text { where }\quad (p_1,\ldots ,p_d)=\mathsf{parent}(i) \end{aligned}$$
(10)

Note that if i is a source, then its label is simply \(\ell _i=\mathsf{h}(i)\).

Fig. 3.
figure 3

The labeling game \(\mathsf{computeLabel}(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})\).

We consider a game \(\mathsf{computeLabel}(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})\) where an algorithm \({\mathtt {A}}\) must m times consecutively compute the label of a node chosen at random from C. \({\mathtt {A}}\) gets an initial state \(\sigma _0={\sigma _\mathsf{init}}\). The cumulative memory complexity is defined as follows.

$$\begin{aligned} \mathsf{cmc}^{\texttt {pROM}}(\textsf {G}, C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})= & {} \underset{\mathsf{computeLabel}(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})}{\mathbb {E}}[{\mathsf{cost}}]\\ \mathsf{cmc}^{\texttt {pROM}}(G,C,m,{\sigma _\mathsf{init}})= & {} \min _{\mathtt {A}}\underset{\mathsf{h}\leftarrow \mathcal{H}}{\mathbb {E}}\mathsf{cmc}^{\texttt {pROM}}(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h}) \end{aligned}$$

The time complexity of a given adversary is

$$\mathsf{time}^\texttt {pROM}(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h}) = \underset{\mathsf{computeLabel}(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})}{\mathbb {E}}[{\mathsf{cnt}}]$$

We will also consider this notion against the best adversaries from some restricted class of adversaries, in this case we put the class as subscript, like

$$\begin{aligned} \mathsf{cmc}^{\texttt {pROM}}_\mathcal{A}(G,C,m,{\sigma _\mathsf{init}})= & {} \min _{{\mathtt {A}}\in \mathcal{A}}\underset{\mathsf{h}\leftarrow \mathcal{H}}{\mathbb {E}}\mathsf{cmc}^{\texttt {pROM}}(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})\\ \end{aligned}$$

As for pebbling, also here we will consider the more meaningful \(\epsilon \) variants of these notions

$$\begin{aligned} \mathsf{cmc}^{\texttt {pROM}}_\epsilon (\textsf {G}, C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})= & {} \inf \left\{ \gamma \ \left| \underset{\mathsf{computeLabel}(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})}{\mathbb {P}}\left[ \mathsf{cost}\ge \gamma \right] \ge 1-{\epsilon }\right\} \right. \\ \mathsf{cmc}^{\texttt {pROM}}_\epsilon (G,C,m,{\sigma _\mathsf{init}})= & {} \min _{\mathtt {A}}\underset{\mathsf{h}\leftarrow \mathcal{H}}{\mathbb {E}}\mathsf{cmc}^{\texttt {pROM}}_\epsilon (G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})\\ \mathsf{time}^\texttt {pROM}_{\epsilon }(\textsf {G}, C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})= & {} \inf \left\{ \gamma \ \left| \underset{\mathsf{computeLabel}(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})}{\mathbb {P}}\left[ \mathsf{cnt}\ge \gamma \right] \ge 1-{\epsilon }\right\} \right. \end{aligned}$$

2.5 scrypt and the \(\mathsf{computeLabel}\) Game

We informally discuss the relation between evaluating scrypt in the pROM and the \(\mathsf{computeLabel}\) game for the line graph (described below) and, and explain why we will focus on the latter. A similar discussion can be made for Argon2d.

First, recall that scrypt uses a hash function \(\mathsf{h}: \{0,1\}^* \rightarrow \{0,1\}^w\), and proceeds in two phases, given an input X. In the first phase it computes \(X_i = \mathsf{h}^{i}(X)\) for all \(i \in [n]\),Footnote 6 and in the second phase, setting \(S_0 = X_n\), it computes \(S_1, \ldots , S_n\) defined recursively to be

$$ S_i = \mathsf{h}(S_{i-1} \oplus X_{\mathsf {int}(S_{i-1})}) $$

where \(\mathsf {int}(S)\) reduces a w-bit string S to an integer in [n] such that if S is uniform random then \(\mathsf {int}(S)\) is (close to) uniform over [n]. The final output of \(\texttt {scrypt} ^h_n(X) = S_n\). To show that \(\texttt {scrypt} \) is memory-hard, we need to lower-bound the CMC required to compute it in the pROM.

We argue that to obtain this bound it suffices to restrict our attention to the minimal final value of \(\mathsf{cost}\) in \(\mathsf{cmc}^{\texttt {pROM}}(L_n, [n], n)\) where \(L_n=(V,E)\) is the line graph where \(V=[n]\) and \(E=\{(i,i+1)\ :\ i\in [n-1]\}\). Intuitively this is rather easy to see. Clearly any algorithm which hopes to evaluate scrypt with more than negligble probability must, at some point, compute all \(X_i\) values and all \(S_j\) values since guessing them is almost impossible. Moreover until \(S_{i-1}\) has been computed the value of \(\mathsf {int}(S_{i-1})\) – i.e. the challenge label needed to compute \(S_i\) – is uniform random and independent, just like the distribution of \(i^{th}\) challenge \(c{\leftarrow }C\) in the \(\mathsf{computeLabel}\) game. In other words once an algorithm has computed the values \(X_1, \ldots , X_n\) computing the values of \(S_1, \ldots , S_n\) corresponds exactly to playing the \(\mathsf{computeLabel}\) game on graph \(L_n\) with challenge set [n] for n rounds. The initial state is exactly the state given to the algorithm as input in the step where it first computes \(X_n\). It is immediate that, when restricted to strategies which don’t simply guess relevant outputs of \(\mathsf{h}\), then any strategy for computing the values \(S_1, \ldots , S_n\) corresponds to a strategy for playing \(\mathsf{computeLabel}(L_n, [n],n)\).

In summary, once \({\mathtt {A}}\) has finished the first phase of evaluating scrypt, the second phase essentially corresponds to playing the \(\mathsf{computeLabel}\) game on the graph \(L_n\) with challenge set [n] for n rounds. The initial state \({\sigma _\mathsf{init}}\) in \(\mathsf{computeLabel}\) is the state given to \({\mathtt {A}}\) as input in the first step of round 1 (i.e. in the step when \({\mathtt {A}}\) first computes \(X_n\)). It is now immediate that (when restricted to strategies which don’t simply guess relevant outputs of \(\mathsf{h}\)) then any strategy \({\mathtt {A}}\) for computing the second phase of scrypt is essentially a strategy for playing \(\mathsf{computeLabel}(L_n, [n],n)\). Clearly the total CMC of \({\mathtt {A}}\) when computing both phases of scrypt is at least the CMC of computing just the second. Thus our lowerbound on \(\mathsf{cmc}^{\texttt {pROM}}(L_n,[n],n)\) in Theorem 15 also gives us a lower bound on the CMC of \(\texttt {scrypt} _n\). (The proof is rather tedious, and omitted from this version of the paper).

Simple Algorithms. Theorem 15 below will make no restrictions on the algorithm playing \(\mathsf{computeLabel}\), at the cost of relying on \(\gamma _n\), for which we only conjecture an upper bound. We do not need such conjectures if we restrict our attention to simple algorithms from the class \(\mathcal{A}_{SA}\): A simple algorithms \({\mathtt {A}}\in \mathcal{A}_{SA}\) is one which either stores a value \(X_i\) directly in its intermediary statesFootnote 7 or stores nothing about the value of \(X_i\) at all. (They are however permitted to store arbitrary other information in their states.) For example a simple algorithm may not store, say, \(X_i \oplus X_j\) or just the first 20 bits of \(X_i\). We note that, to the best of our knowledge, all algorithms in the literature for computing scrypt (or any memory-hard function for that matter) are indeed of this form. For simple algorithms, then we obtain an unconditional lower-bound on the CMC of scrypt by using Theorem 4 below, which only consider pebbling games.

Much as in the more general case above, for the set of algorithms \(\mathcal{A}_{SA}\) we can now draw a parallel between computing phase two of scrypt in the pROM and playing the game \(\mathsf{pebble}\) on the graph \(L_n\) with challenge set [n] for n rounds. Therefore Theorem 4 immediatly gives us a lower-bound on the CMC of \(\texttt {scrypt} _n\) for all algorithms in \(\mathcal{A}_{SA}\).

Entangled Adversaries. In fact we can even relax our restrictions on algorithms computing scrypt to the class \(\mathcal{A}_{EA}\) of entangled algorithms while still obtaining an unconditional lower-bound on the CMC of scrypt. In addition to what is permitted for simple algorithms we also allow storing “entangled” information about the values of \(X_1, \ldots , X_n\) of the following form. For any subset \(L\subseteq [n]\) and integer \(t\in [|L|]\) an algorithm can store an encoding of \(X_L = \{X_i\}_{i\in L}\) such that if it obtains any t values in L then it can immediatly output all remaining \(|L|-t\) values in L with no further information or queries to \(\mathsf{h}\). One such encoding uses polynomial interpolation as described in the introduction. Indeed, this motivates our definition of entangled pebbles above.

As shown in the full version, the class \(\mathcal{A}_{EA}\) is (in general) strictly more powerful \(\mathcal{A}_{SA}\) when it comes to minimizing CMC. Thus we obtain a more general unconditional lower-bound on the CMC of scrypt using Theorem 9 which lower-bounds \(\mathsf{cc}^\updownarrow (L_n, [n], n, n)\), the entangled cumulative pebbling complexity of \(L_n\).

3 Pebbling Lower Bounds for the Line Graph

In this section, we prove lower bounds for the cumulative complexity of the n-round probabilistic pebbling game on the line graph \(L_n\) with challenges from [n]. We will start with the case without entanglement (i.e., dealing only with black pebbles) which captures the essence of our proof, and then below, extend our proof approach to the entangled case.

Theorem 4

(Pebbling Complexity of the Line Graph). For all \(0 \le k \le n\), and constant \(\delta > 0\) ,

$$\textsf {cc}[\delta ](L_n, C = [n], n, k) = \varOmega _\delta \left( \frac{n^{2}}{\log ^2(n)}\right) .$$

We note in passing that the above theorem can be extended to handle a different number of challenges \(t \ne n\), as it will be clear in the proof. We dispense with the more general theorem, and stick with the simpler statement for the common case \(t = n\) motivated by \(\texttt {scrypt} \). The notation \(\varOmega _{\delta }\) indicates that the constant hidden in the \(\varOmega \) depends on \(\delta \).

In fact, we also note that our proof allows for more concrete statements as a function of \(\delta \), which may be constant. However, not surprisingly, the bound becomes weaker the smaller \(\delta \) is, but note that if we are only interested in the expectation \(\textsf {cc}(L_n, C = [n], n, k)\), then applying the result with \(\delta = O(1)\) (e.g., \(\frac{1}{2}\)) is sufficient to obtain a lower bound of \(\varOmega \left( \frac{n^{2}}{\log ^2 n}\right) \).

Proof intuition – the expectation game. Before we turn to the formal proof, we give some high-level intuition. It turns out that most of the proof is going to in fact lower bound the cc of a much simpler game, where the goal is far simpler than covering challenges from [n] with a pebble. In fact, the game will be completely deterministic.

The key observation is that every time a new challenge \(c_i\) is drawn, and the player has reached a certain pebbling configuration P, then there is a well-defined expected number \(\varPhi (P)\) of steps the adversary needs to take at least in order to cover the random challenge. We refer to \(\varPhi (P)\) as the potential of P. In particular, the best strategy is the greedy one, which looks at the largest \(j = j(c_i) \le c_i\) on which a pebble is placed, i.e., \(j \in P\), and then needs to output a valid sequence of at least \(c_i - j\) further pebbling configurations, such that the last configuration contains \(c_i\). Note if \(j = c_i\), we still need to perform one step to output a valid configuration. Therefore, \(\varPhi (P)\) is the expected value of \(\max ( 1, c_i - j(c_i) )\). We will consider a new game – called the expectation game – which has the property that at the beginning of every stage, the challenger just computes \(\varPhi (P)\), and expects the player \(\mathtt {T}\) to take \(\varPhi (P)\) legal steps until \(\mathtt {T}\) can move to the next stage.

Note that these steps can be totally arbitrary – there is no actual challenge any more to cover. Still, we will be interested in lower bounding the cumulative complexity of such a strategy for the expectation game, and it is not obvious how \(\mathtt {T}\) can keep the cc low. Indeed:

  • If the potential is high, say \(\varPhi (P) = \varOmega (n)\), then this means that linearly many steps must be taken to move to the next stage, and since every configuration contains at least one pebble, we pay a cumulative cost of \(\varOmega (n)\) for the present stage.

  • Conversely, if the potential \(\varPhi (P)\) is low (e.g., O(1)), then we can expect to be faster. However we will show that this implies that there are many pebbles in P (at least \(\varOmega (n/\varPhi (P))\)), and thus one can expect high cumulative cost again, i.e.,, linear \(\varOmega (n)\).

However, there is a catch – the above statements refer to the initial configurations. The fact that we have many pebbles at the beginning of a stage and at its end, does not mean we have many pebbles throughout the whole stage. Even though the strategy \(\mathtt {T}\) is forced to pay \(\varPhi (P)\) steps, the strategy may try to drop as many pebbles as possible for a while, and then adding them back again. Excluding that this can happen is the crux of our proof. We will indeed show that for the expectation game, any strategy incurs cumulative complexity \(\varOmega (n^2/\log ^2(n))\) roughly. The core of the analysis will be understanding the behavior of the potential function throughout a stage.

Now, we can expect that a low-cc strategy \(\mathtt {T}\) for the original parallel pebbling game on \(L_n\) gives us one for the expectation game too – after all, for every challenge, the strategy \(\mathtt {T}\) needs to perform roughly \(\varPhi (P)\) steps from the initial pebbling configuration when learning the challenge. This is almost correct, but again, there is a small catch. The issue is that \(\varPhi (P)\) is only an expectation, yet we want to have the guarantee that we go for \(\varPhi (P)\) steps with sufficiently high probability (this is particularly crucial if we want to prove a statement which is parameterized by \(\delta \)). However, this is fairly simple (if somewhat tedious) to overcome – the idea is that we partition the n challenges into \(n/\lambda \) groups of \(\lambda \) challenges. For every such group, we look at the initial configuration P when learning the first of the next \(\lambda \) challenges, and note that with sufficiently high probability (roughly \(e^{-\varOmega (\lambda ^2)}\) by a Chernoff bound) there will be one challenge (among these \(\lambda \) ones) which is at least (say) \(\varPhi (P)/2\) away from the closest pebble. This allows us to reduce a strategy for the n-challenge pebbling game on \(L_n\) to a strategy for the \((n/\lambda )\)-round expectation game. The value of \(\lambda \) can be chosen small enough not to affect the overall analysis.

Proof

(Theorem 4 ). As the first step in the proof, we are going to reduce playing the game \(\mathsf{pebble}(L_n,C = [n],n,\mathtt {T}, {P_\mathsf{init}})\), for an arbitrary player \(\mathtt {T}\) and initial pebbling configuration \({P_\mathsf{init}}\) (\(|{P_\mathsf{init}}| \le k\)), to a simpler (and somewhat different) pebbling game, which we refer to as the expectation game.

To this end, we introduce first the concept of a potential function \(\varPhi : 2^{[n]} \rightarrow \mathbb {N}\). The potential of a pebbling configuration \(P=\{\ell _1,\ell _2,\ldots ,\ell _m\}\subseteq [n]\) is

$$\begin{aligned} \varPhi (P):= & {} \tfrac{m}{n} + \tfrac{1}{n} \sum _{i=0}^{m} \left( 1+ \ldots + (l_{i+1}-l_{i}-1) \right) \\= & {} \tfrac{m}{n} + \tfrac{1}{2n} \sum _{i=0}^{m}(\ell _{i+1}-\ell _i) \cdot (\ell _{i+1}-\ell _i-1) = \tfrac{1}{2n}\sum _{i=0}^{m} (\ell _{i+1}-\ell _i)^2 - \tfrac{n+1 - 2m}{2n} \end{aligned}$$

Here \(m=|P|\) and we let \(\ell _0 = 0\) and \(\ell _{m+1} = n + 1\) as notational placeholders. Indeed, \(\varPhi (P)\) is the expected number of moves required (by an optimal strategy) to pebble a random challenge starting from the pebbling configuration P, where the expectation is over the choice of the random challenge. (Note in particular it is required to pay at least one move even if a pebble is already on the challenge node.) In other words, \(\varPhi (P)\) is exactly \(\mathsf{time}(L_n, [n], 1, \mathtt {T}^*, P)\) for the optimal strategy \(\mathtt {T}^*\).

Now we are ready to introduce the expectation game which has no challenge. At the beginnning of every stage, the challenger only computes \(\varPhi (P)\), and expects the player \(\mathtt {T}\) to take \(\varPhi (P)\) steps until he can move to the next stage. The game \(\mathsf {expect}(n, t, \mathtt {T}, {P_\mathsf{init}})\) is played by a pebbler \(\mathtt {T}\) as depicted in Fig. 4.

Fig. 4.
figure 4

The Expectation Game

In the following, for a (randomized) pebbler \(\mathtt {T}\) and initial configuration \({P_\mathsf{init}}\), we write \(\mathsf {expect}_{n, t}(\mathtt {T}, {P_\mathsf{init}})\) for the output of the expectation game; note the output only depends on the randomness of pebbler \(\mathtt {T}\) and configuration \({P_\mathsf{init}}\). We similarly define the cumulative complexity of the expectation game

$$\begin{aligned} \textsf {cc}[\delta ](\mathsf {expect}_{n, t}(\mathtt {T},{P_\mathsf{init}})):= & {} \inf \left\{ \gamma \ ~\left| ~ \underset{\mathsf {expect}(n,t,\mathtt {T},{P_\mathsf{init}})}{\mathbb {P}}\left[ {\mathsf{cost}}\ge \gamma \right] \ge 1-{\epsilon }\right\} \right. \\ \textsf {cc}[\delta ](\mathsf {expect}_{n, t, k} ):= & {} \min _{\underset{|{P_\mathsf{init}}|\le k}{\mathtt {T},{P_\mathsf{init}}\subseteq V}}\left\{ \textsf {cc}[\delta ](\mathsf {expect}_{n, t}(\mathtt {T},{P_\mathsf{init}})) \right\} \end{aligned}$$

The expectation game \(\mathsf {expect}_{n, t, k}\) has an important feature: because the randomness is only over the pebbler’s coins, these coins can be fixed to their optimal choice without making the overall cc worse. This implies that \(\textsf {cc}_{\delta }(\mathsf {expect}_{n, t, k}) = \textsf {cc}_{0}(\mathsf {expect}_{n,t,k})\) for all \(\delta \ge 0\). In particular, we use the shorthand \(\textsf {cc}(\mathsf {expect}_{n,t,k})\) for the latter.

The remainder of the proof consists of the following two lemmas. Below, we combine these two lemmas in the final statement, before turning to their proofs. (The proof of Lemma 5 is deferred to the full version for lack of space, and relies on the intuition given above.)

Lemma 5

(Reduction to the Expectation Game). For all \(n, t, k, \lambda \), and any \(\delta > 3\mu (t,\lambda )\), we have

$$\begin{aligned} \textsf {cc}(\mathsf {expect}_{n,t, k}) = \textsf {cc}_{\delta - 3\mu (t, \lambda )}(\mathsf {expect}_{n, t, k}) \le 2 \cdot \textsf {cc}_{\delta }(L_n, C = [n], t \cdot \lambda , k) \;, \end{aligned}$$

where \(\mu (t, \lambda ) = t \cdot e^{- \lambda ^2 /8}\).

To give some intuition about the bound, note that in general, for every \(\delta ' \le \delta \), we have \(\textsf {cc}[\delta '](\mathsf {expect}_{n, t, k}) \le \textsf {cc}[\delta ](\mathsf {expect}_{n, t, k})\). This is because if a c is such that for all \(\mathtt {T}\) and \({P_\mathsf{init}}\) we have \(\Pr {\mathsf {expect}_{n,t}(\mathtt {T}, {P_\mathsf{init}}) \ge c} \ge 1 - \delta '\), then also \(\Pr {\mathsf {expect}_{n,t}(\mathtt {T}, {P_\mathsf{init}}) \ge c} \ge 1 - \delta \). Thus the set from which we are taking the supremum only grows bigger as \(\delta \) increases. In the specific case of Lemma 5, the \(3\mu (t,\lambda )\) offset captures the loss of our reduction.

Lemma 6

(CC Complexity of the Expectation Game). For all \(t, 0 \le k \le n\) and \(\epsilon > 0\), we have

$$\begin{aligned} \textsf {cc}(\mathsf {expect}_{n,t,k}) \ge \left\lfloor \frac{\epsilon t}{2} \right\rfloor \cdot \frac{n^{1 - \epsilon }}{6} \;. \end{aligned}$$

To conclude the proof before turning to the proofs of the above two lemmas, we choose \(t, \lambda \) such that \(t \cdot \lambda = n\), and \(\mu (t, \lambda ) = t \cdot e^{- \lambda ^2 /8} < \delta /3\). We also set \(\epsilon = 0.5 \log \log (n)/\log (n)\), and note that in this case \(n^{1 - \epsilon } = n/\sqrt{\log (n)}\). In particular, we can set \(\lambda = O(\sqrt{\log t})\), and can choose e.g. \(t = n/\sqrt{\log n}\). Then, by Lemma 6,

$$\begin{aligned} \textsf {cc}(\mathsf {expect}_{n,t, k}) \ge \left\lfloor \frac{\epsilon t}{2} \right\rfloor \cdot \frac{n^{1 - \epsilon }}{6} = \varOmega \left( \frac{n^2}{\log ^2(n)} \right) \;. \end{aligned}$$

This concludes the proof of Theorem 4.

Proof

(Proof of Lemma 6 ). First we observe if a pebbling configuration P has potential \(\varPhi \), the size \({|}{P}{|}\) of the pebbling configuration (i.e., the number of vertices on which a pebble is placed) will be at least \(\frac{n}{6 \cdot \varPhi }\). We give a formal proof for completeness.Footnote 8

Lemma 7

For every non-empty pebbling configuration \(P \subseteq [n]\), we have

$$\begin{aligned} \varPhi (P) \cdot |P| \ge \frac{n}{6} \;. \end{aligned}$$

Proof

Let \(m={|}{P}{|} \ge 1\), by definition of potential:

$$\begin{aligned} \varPhi (P) = \frac{1}{2n}\sum _{i=0}^{m}(\ell _{i+1}-\ell _i)^2-\frac{n+1-2m}{2n} \;, \end{aligned}$$

where \(\ell _0 = 0\) and \(\ell _{m+1} = n + 1\) are notational placeholders. Since \(\varPhi (P) \ge 1\) and \(m \ge 1\), we have \(\frac{n+1-2m}{2n} \le \frac{1}{2} \le \frac{1}{2} \cdot \varPhi (P)\). Therefore

$$ \varPhi (P) \ge \frac{2}{3} \cdot \frac{1}{2n}\sum _{i=0}^{m}(\ell _{i+1}-\ell _i)^2 \;, $$

since \(m \ge \frac{m+1}{2}\), multiply the left side by m and the right side by \(\frac{m+1}{2}\), we have

$$ \varPhi (P) \cdot m \ge \frac{2}{3} \left( \frac{1}{2n}\sum _{i=0}^{m}(\ell _{i+1}-\ell _i)^2 \right) \cdot \frac{m+1}{2} = \frac{1}{6n} \left( \sum _{i=0}^{m} (\ell _{i+1}-\ell _i)^2\right) \cdot (m+1) $$

Therefore \(\varPhi (P) \cdot m \ge \frac{n}{6}\) follows, since by Cauchy-Schwarz Inequality we have

$$ \left( \sum _{i=0}^{m} (\ell _{i+1}-\ell _i)^2\right) \cdot (m+1) \ge \left( \sum _{i=0}^{m} (\ell _{i+1}-\ell _i) \right) ^2 \ge n^2 \;. $$

   \(\square \)

Also, the following claim provides an important property of the potential function.

Lemma 8

In one iteration, the potential can decrease by at most one.

Proof

Consider an arbitrary configuration \(P=\{\ell _1,\ell _2,\ldots ,\ell _m\}\subseteq [n]\). The best that a pebbling algorithm can do to decrease the potential is to place new pebbles next to all the current pebbles – let’s call the new configuration \(P'\). That is,

$$\begin{aligned} P'=\{\ell _1,\ell _1+1,\ell _2,\ell _2+1,\ldots ,\ell _m,\ell _m+1\}\subseteq [n]. \end{aligned}$$

The potential of the new configuration is

$$\begin{aligned} \varPhi (P')&= \frac{1}{2n} \left( \ell _{1}^2 + \sum _{i=1}^{m} 1+ (\ell _{i+1}-(\ell _i+1))^2\right) - \frac{n+1-2|P'|}{2n} \end{aligned}$$
(11)
$$\begin{aligned}&= \frac{1}{2n} \left( m + \sum _{i=0}^{m} \left( (\ell _{i+1}-\ell _i)^2-2(\ell _{i+1}-\ell _i)+1 \right) \right) - \frac{n+1-2|P'|}{2n}\end{aligned}$$
(12)
$$\begin{aligned}&\ge \frac{1}{2n} \left( m + \sum _{i=0}^{m} \left( (\ell _{i+1}-\ell _i)^2-2(\ell _{i+1}-\ell _i)+1\right) \right) - \frac{n+1-2m}{2n}\end{aligned}$$
(13)
$$\begin{aligned}&\ge \varPhi (P)+\frac{m}{n}-\frac{1}{n}\sum _{i=0}^{m} (\ell _{i+1}-\ell _i)\ge \varPhi (P)-1 \end{aligned}$$
(14)

where the first inequality holds because \(|P'| \ge m\).    \(\square \)

Assume without loss of generality the pebbler \(\mathtt {T}\) is legal and deterministic. Consider a particular round \(i \in [t]\) of the expectation game. Let P and \(P'\) denote the initial and final pebbling configurations in the i-th round, and let us denote by \(\phi _i = \varPhi (P)\) the potential of the initial configuration in round i. Depending on the value of \(\varPhi (P')\), we classify the pebbling sequence from P to \(P'\) into three different categories:

  • Type 1: \(\varPhi (P')>\phi _i \cdot n^{\epsilon /2}\); or

  • Type 2: \(\varPhi (P')\le \phi _i \cdot n^{\epsilon /2}\) – we have two sub-cases:

    • Type 2a: the potential was always less than \(\phi _i \cdot n^{\epsilon }\) for all the intermediate pebbling configurations from P to \(P'\); or

    • Type 2b: the potential went above \(\phi _i \cdot n^{\epsilon }\) for some intermediate configuration.

With each type, we associate a cost that the pebbling algorithm has to pay, which lower bounds the contribution to the cumulative complexity of the pebbling configurations generated during this stage. The pebbling algorithm can carry out pebbling of Type 1 for free Footnote 9 – however, the latter two have accompanying costs.

  • For pebbling sequences of Type 2a, the corresponding cumulative cost is at least \(\phi _i \cdot \frac{n}{6 \cdot \phi _i n^{\epsilon } } = \frac{1}{6} n^{1-\epsilon }\) since by Lemma 7, the size of the pebbling configuration is never less than \(\frac{n}{6 \phi _i n^{\epsilon } }\) during all intermediate iterations and in stage i valid pebbler must produce at least \(\phi _i\) configurations.

  • For sequences of Type 2b, by Lemma 8, it follows that in a Type 2b sequence it takes at least \(\phi _i (n^{\epsilon } - n^{\epsilon /2})\) steps to decrease the potential from \(\phi \cdot n^{\epsilon }\) to \(\phi _i \cdot n^{\epsilon /2}\), and the size of the pebbling configuration is at least \(\frac{n}{6 \phi _i n^{\epsilon }}\) in every intermediate step by Lemma 7. Therefore, the cumulative cost is at least

    $$\begin{aligned} \phi _i (n^{\epsilon } - n^{\epsilon /2}) \cdot \frac{n}{6 \phi _i n^{\epsilon }} \ge \frac{n}{6} - \frac{n^{1- \epsilon /2}}{6} \ge \frac{1}{6} n^{1-\epsilon } \;, \end{aligned}$$

where the last inequality follows for sufficiently large n.

To conclude the proof, we partition the \(t \ge \lceil 2/\epsilon \rceil \) rounds into groups of consecutive \( \lceil 2/\epsilon \rceil \) phases. We observe that any group must contain at least one pebbling sequence of Type 2: otherwise, with \(\phi \) being the potential at the beginning of the first of theses \(2/\epsilon \) phases, the potential at the end would be strictly larger than

$$\begin{aligned} \phi n^{\frac{\epsilon }{2} \cdot \frac{2}{\epsilon } } \ge \phi \cdot n > n/2 \end{aligned}$$

which cannot be, as the potential can be at most \(\frac{n}{2}\). By the above, however, the cumulative complexity of each group of phases is at least \(\frac{n^{1-\epsilon }}{6}\), and thus we get

$$\begin{aligned} cc(\mathsf {expect}_{n,t,k}) \ge \left\lfloor \frac{\epsilon t}{2} \right\rfloor \cdot \frac{n^{1-\epsilon }}{6} \;, \end{aligned}$$
(15)

which concludes the proof of Lemma 6.    \(\square \)

As the second result, we show that the above theorem also holds for the entangled case.

Theorem 9

(Entangled Pebbling Complexity of the Line Graph). For all \(0 \le k \le n\) and constant \(\delta >0\),

$$\mathsf{cc}^\updownarrow _{\delta }(L_n, C = [n], n,k) = \varOmega \left( \frac{n^{2}}{\log ^2 n}\right) \;.$$

Luckily, it will not be necessary to repeat the whole proof. We will give now a proof sketch showing that in essence, the proof follows by repeating the same format and arguments as the one for Theorem 4, using Lemma 3 as a tool.

Proof

(Sketch). One can prove the theorem following exactly the same framework of Theorem 4, with a few differences. First off, we define a natural entangled version of the expectation game where, in addition to allowing entanglement in a pebbling configuration, we define the potential as

$$\begin{aligned} \varPhi ^{\updownarrow }(P) = \mathsf{time}^\updownarrow (L_n, C = [n], 1, \mathtt {T}^{*, \updownarrow }, P) \;, \end{aligned}$$

i.e., the expected time complexity for one challenge of an optimal entangled strategy \(\mathtt {T}^{*, \updownarrow }\) starting from the (entangled) pebbling configuration P.

First off, a proof similar to the one of Lemma 5, based on a Chernoff bound, can be used to show that if we separate challenges in t chunks of \(\lambda \) challenges each, and we look at the configuration P at the beginning of each of the t chunks, then there exists at least one challenge (out of \(\lambda \)) which requires spending time \(\varPhi ^{\updownarrow }(P)\) to be covered, except with small probability.

A lower bound on the cumulative complexity of the (entangled) expectaton game follows exactly the same lines as the proof as Lemma 6. This is because the following two facts (which correspond to the two lemmas in the proof of Lemma 6) are true also in the setting with entanglement:

  • First off, for every P and \(\mathtt {T}^{*, \updownarrow }\) such that \(\varPhi ^{\updownarrow }(P) = \mathsf{time}^\updownarrow (L_n, C = [n], 1, \mathtt {T}^{*, \updownarrow }, P)\), Lemma 3 guarantees that there exist a (regular) pebbling strategy \(\mathtt {T}^{'}\) and a (regular) pebbling configuration \(P'\) such that \({|P|_{{\updownarrow }}} \ge |P'|\) and

    $$\begin{aligned} \begin{aligned} \varPhi ^{\updownarrow }(P)&= \mathsf{time}^\updownarrow (L_n, C = [n], 1, \mathtt {T}^{*, \updownarrow }, P) \\&\ge \mathsf{time}(L_n, C = [n], 1, \mathtt {T}^{'}, P') \ge \varPhi (P') \;. \end{aligned} \end{aligned}$$

    Therefore, by Lemma 7,

    $$\begin{aligned} {|P|_{{\updownarrow }}} \cdot \varPhi ^{\updownarrow }(P) \ge |P'| \cdot \varPhi (P') \ge \frac{n}{6} \;. \end{aligned}$$
    (16)
  • Second, the potential can decrease by at most one when making an arbitrary step from one configuration P to one configuration \(P'\). This is by definition – assume it were not the case, and \(\varPhi ^{\updownarrow }(P') < \varPhi ^{\updownarrow }(P) - 1\). Then, there exists a strategy to cover a random challenge starting from P which first moves to \(P'\) in one step, and then applies the optimal strategy achieving expected time \(\varPhi ^{\updownarrow }(P')\). The expected number of steps taken by this strategy is smaller than \(\varPhi ^{\updownarrow }(P)\), contradicting the fact that \(\varPhi ^{\updownarrow }(P)\) is the optimal number of steps required by any strategy.    \(\square \)

4 From Pebbling to pROM

4.1 Trancscipts and Traces

Below we define the notion of a trace and transcript, which will allow us to relate the \(\mathsf{computeLabel}\) and \(\mathsf{pebble^\updownarrow }\) experiments. For any possible sequence of challenges \({{\varvec{c}}}\in C^m\), let \(\mathsf{cnt}_{{\varvec{c}}}\) denote the number of steps (i.e., the variable \(\mathsf{cnt}\)) made in the \(\mathsf{computeLabel}(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})\) experiment conditioned on the m challenges being \({{\varvec{c}}}\) (note that once \({{\varvec{c}}}\) is fixed, the entire experiment is deterministic, so \(\mathsf{cnt}_{{\varvec{c}}}\) is well defined). Let \(\tau _{{\varvec{c}}}=q_1|q_2|\ldots |q_{\mathsf{cnt}_{{\varvec{c}}}}\) be the trace of the computation: here \(q_1\subset [n]\) means that the first batch of parallel queries are the queries required to output the labels \(\{\ell _i,i\in q_1\}\), etc.

For example, for the Graph in Fig. 5, \(\tau _7=2|4,5|7\) corresponds to a first query \(\ell _2=\mathsf{h}(2)\), then two parallel queries \(\ell _4=\mathsf{h}(4,\ell _1),\ell _5=\mathsf{h}(5,\ell _2)\), and then the final query computing the label of the challenge \(\ell _7=\mathsf{h}(7,\ell _4,\ell _5,\ell _6)\). A trace as a pebbling. We can think of a trace as a parallel pebbling, e.g., \(\tau _7=2|4,5|7\) means we pebble node 2 in the first step, nodes 4, 5 in the second, and 7 in the last step. We say that an initial (entangled) pebbling configuration \({P_\mathsf{init}}\) is consistent with a trace \(\tau \), if starting from \({P_\mathsf{init}}\), \(\tau \) is a valid pebbling sequence. E.g., consider again the traces \(\tau _7=2|4,5|7, \tau _8=3|6|8\) for the graph in Fig. 5, then \({P_\mathsf{init}}=\{1,5,6\}\) is consistent with \(\tau _7\) and \(\tau _8\), and it’s the smallest initial pebbling having this property. In the entangled case, \({P_\mathsf{init}}^\updownarrow =\{{\langle 1 \rangle _{0}},{\langle 5,6 \rangle _{1}}\}\) is consistent with \(\tau _7,\tau _8\). Note that in the entangled case we only need a pebbling configuration of weight 2, whereas the smallest pebbling configuration for the standard pebbling game has weight 3. In fact, there are traces where the gap between the smallest normal and entangled pebbling configuration consistent with all the traces can differ by a factor \(\varTheta (n)\).

Turning a trace into a transcript. We define the implications \(T_{{\varvec{c}}}\) of a trace \(\tau _{{\varvec{c}}}=q_1|q_2|\ldots |q_{\mathsf{cnt}_{{\varvec{c}}}}\) as follows. For \(i=1,\ldots ,\mathsf{cnt}_{{\varvec{c}}}\), we add the implication \((v_i)\rightarrow (f_i)\), where \(v_i\subset [n]\) denotes all the vertices whose labels have appeared either as inputs or outputs in the experiment so far, and \(f_i\) denotes the labels contained in the inputs from this round which have never appeared before (if the guess for the challenge label in this round is non-empty, i.e., \(\ell \ne \bot \), then we include \(\ell \) in \(f_i\)).

Fig. 5.
figure 5

Graph used in Example 10.

Example 10

Consider the graph from Fig. 5 with \(m=1\) and challenge set \(C=\{7,8\}\), and traces

$$ \tau _7=2|4,5|7 \quad \text {and}\quad \tau _8=3|6|8 $$

We have

$$\begin{aligned} T_7=\{(2)\rightarrow 1, (1,2,4,5)\rightarrow 6\}\quad T_8=\{(3,6)\rightarrow 5 \} \end{aligned}$$
(17)

where e.g. \((2)\rightarrow 1\) is in there as the first query is \(\ell _2=\mathsf{h}(2)\), and the second query is \(\ell _4=\mathsf{h}(4,\ell _1)\) and in parallel \(\ell _5=\mathsf{h}(5,\ell _2)\). At this point we so far only observed the label \(v_2=\{\ell _2\}\), so the label \(f_2=\{\ell _1\}\) used as input in this query is fresh, which means we add the implication \((2)\rightarrow 1\).

Above we formalised how to extract a transcript \(T_{{\varvec{c}}}\) from \((G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})\), with

$$T(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})=\cup _{{{\varvec{c}}}\in C^m}T_{{\varvec{c}}}$$

we denote the union of all \(T_{{\varvec{c}}}\)’s.

4.2 Extractability, Coverability and a Conjecture

In this section we introduce the notion of extractability and coverability of a transcript. Below we first give some intuition what these notions have to do with the \(\mathsf{computeLabel}\) and \(\mathsf{pebble^\updownarrow }\) experiments.

Extractability intuition. Consider the experiment \(\mathsf{computeLabel}(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})\). We can invoke \({\mathtt {A}}\) on some particular challenge sequence \({{\varvec{c}}}\in C^m\), and if at some point \({\mathtt {A}}\) makes a query whose input contains a label \(\ell _i\) which has not appeared before, we can “extract” this value from \(({\mathtt {A}},{\sigma _\mathsf{init}})\) without actually querying \(\mathsf{h}\) for it. More generally, we can run \({\mathtt {A}}\) on several challenge sequences scheduling queries in a way that will maximise the number of labels that can be extracted from \(({\mathtt {A}},{\sigma _\mathsf{init}})\). To compute this number, we don’t need to know the entire input/output behaviour of \({\mathtt {A}}\) for all possible challenge sequences, but the transcript \(T=T(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})\) is sufficient. Recall that T contains implication like \((1,5,6)\rightarrow 3\), which means that for some challenge sequence, there’s some point in the experiment where \({\mathtt {A}}\) has already seen the labels \(\ell _1,\ell _5,\ell _6\), and at this point makes a query whose input contains a label \(\ell _3\) (that has not been observed before). Thus, given \({\sigma _\mathsf{init}}\) and \(\ell _1,\ell _5,\ell _6\) we can learn \(\ell _3\).

We denote with ex(T) the maximum number of labels that can be extracted from T. If the labels are uniformly random values in \(\{0,1\}^w\), then it follows that \({\sigma _\mathsf{init}}\) will almost certainly not be much smaller than \(ex(T)\cdot w\), as otherwise we could compress \(w\cdot ex(T)\) uniformly random bits (i.e., the extracted labels) to a string which is shorter than their length, but uniformly random values are not compressible.

Coverability intuition. In the following, we say that an entangled pebbling experiment \(\mathsf{pebble^\updownarrow }(G,C,m,\mathsf{P},{P_\mathsf{init}}^\updownarrow )\) mimics the \(\mathsf{computeLabel}(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})\) experiment if for every challenge sequence the following is true: whenever \({\mathtt {A}}\) makes a query to compute some label \(\ell _{i}=h(i,\ell _{p_1},\ldots ,\ell _{p_t})\), \(\mathsf{P}\) puts a (normal) pebble on i. For this \({P_\mathsf{init}}^\updownarrow \) must contain (entangled) pebbles that allow to cover every implication in T (as defined above), e.g., if \((1,5,6)\rightarrow 3\in T\), then from the initial pebbling \({P_\mathsf{init}}^\updownarrow \) together with the pebbles \({\langle 1 \rangle _{0}},{\langle 5 \rangle _{0}},{\langle 6 \rangle _{0}}\) seen so far it must be possible derive \({\langle 3 \rangle _{0}}\), i.e., \({\langle 3 \rangle _{0}}\in \mathsf{closure}({P_\mathsf{init}}^\updownarrow \cup {\langle 1 \rangle _{0}},{\langle 5 \rangle _{0}},{\langle 6 \rangle _{0}}\})\). We say that such an initial state \({P_\mathsf{init}}^\updownarrow \) covers T. We’re interested in the maximum possible ratio of \(\max _{T\text { over }[n]}\min _{{P_\mathsf{init}}^\updownarrow ,{P_\mathsf{init}}^\updownarrow \text { covers }T}{{|{P_\mathsf{init}}^\updownarrow |_{{\updownarrow }}}}/{ex(T)}\), which we’ll denote with \(\gamma _n\), thus, if any T is k extractable, it can be covered by an initial pebbling \({P_\mathsf{init}}^\updownarrow \) of weight \(\gamma _n\cdot k\). The best current lower bound we have on \(\gamma _n\) is 1.5, we conjecture that \(\gamma _n\) is small, \(\mathrm {polylog}(n)\) or even constant. We will prove in Sect. 4.3 that pebbling time complexity implies pROM time complexity for any graph, and in Sect. 4.4 that CC complexity implies cumulative complexity in the pROM model for the scrypt graph. The loss in our reductions will depend on \(\gamma _n\). Assuming \(\gamma _n=\varTheta (1)\) we get the best bounds one can hope for, but already \(\gamma _n\in o(n)\) would give the first non-trivial bounds on pROM complexity.

Definitions. Let \(n\in \mathbb {N}\). An “implication” \((\mathcal{X})\rightarrow z\) given by a value \(z\in [n]\) and a subset \(\mathcal{X}\subset [n]\setminus z\) means that “knowing \(\mathcal{X}\) gives z for free”. We use \((\mathcal{X})\rightarrow \mathcal{Z}\) as a shortcut for the set of implications \(\{(\mathcal{X})\rightarrow z\ :\ z\in \mathcal{Z}\}\).

A transcript is a set of of implications. Consider a transcript \(T=\{\alpha _1,\ldots ,\alpha _\ell \}\), each \(\alpha _i\) being an implication. We say that a transcript T is k (\(0\le k\le n\)) extractable if there exists an extractor E that makes at most \(n-k\) queries in the following game:

  • At any time E can query for a value in [n].

  • Assume E has values \(\mathcal{L}\subset [n]\) and there exists an implication \((\mathcal{X})\rightarrow z\in T\) where \(\mathcal{X}\subset \mathcal{L}\), then E gets the value z “for free”.

  • The game is over when E has received all of [n].

Every (even an empty) transcript T is 0 extractable as E can always simply ignore T and query for \(1,2,\ldots ,n\). Let

$$ ex(T)=\max _{k}(T\text { is }k\text {-extractable}) $$

Example 11

Let \(n=5\) and consider the transcript

$$\begin{aligned} T=\{ (1,2)\rightarrow 3,(2,3)\rightarrow 1,(3,4)\rightarrow 2,(1)\rightarrow 4\} \end{aligned}$$
(18)

This transcript is 2 but not 3 extractable. To see 2 extractability consider the E which first asks for 1, then gets 4 for free (due to \((1)\rightarrow 4\)), next E asks for 2 and gets 3 for free (due to \((1,2)\rightarrow 3\)).

A set S of entangled pebbles covers an implication \((\mathcal{X})\rightarrow z\) if \(z\in \mathsf{closure}(S\cup {\langle \mathcal{X} \rangle _{0}})\), with \(\mathsf{closure}\) as defined in Definition 1.

Definition 12

( k -coverable). We say that a transcript T is k-coverable if there exists a set of entangled pebbles S of total weight k such that every implication in T is covered by S. With \(cw(T)\) we denote the minimum weight of an S covering T:

$$ cw(T)=\min _{S\text { that covers }T}{|S|_{{\updownarrow }}} $$

Note that every transcript is trivially n coverable by using the pebble \({\langle 1,\ldots ,n \rangle _{0}}\) of weight n which covers every possible implication. For the 2 extractable transcript from Example 11, a set of pebbles of total weight 2 covering it is

$$\begin{aligned} S=\{{\langle 1,2,3 \rangle _{2}},{\langle 1,4 \rangle _{1}}\} \end{aligned}$$
(19)

For example \((3,4)\rightarrow 2\) is covered as \(2\in \mathsf{closure}({\langle 1,2,3 \rangle _{2}},{\langle 1,4 \rangle _{1}},{\langle 3,4 \rangle _{0}})=\{1,2,3,4\}\): we first can set \(\varGamma =\{3,4\}\) (using \({\langle 3,4 \rangle _{0}}\)), then \(\varGamma =\{1,3,4\}\) using \({\langle 1,4 \rangle _{1}}\), and then \(\varGamma =\{1,2,3,4\}\) using \({\langle 1,2,3 \rangle _{2}}\).

We will be interested in the size of the smallest cover for a transcript T. One could conjecture that every k-extractable transcript is k-coverable. Unfortunately this is not true, consider the transcript

$$\begin{aligned} T^*=\{(2,5)\rightarrow 1,(1,3)\rightarrow 2,(2,4)\rightarrow 3,(3,5)\rightarrow 4,(1,4)\rightarrow 5\} \end{aligned}$$
(20)

We have \(ex(T^*)=2\) (e.g. via query 2, 4, 5 and extract 1, 3 using \((2,5)\rightarrow 1,(2,4)\rightarrow 3\)), but it’s not 2-coverable (a cover of weight 3 is e.g. \(\{{\langle 5,1 \rangle _{1}}\},{\langle 2,3,4 \rangle _{1}}\}\)). With \(\gamma _n\) we denote the highest coverability vs extractability ration that a transcript over [n] can have:

Conjecture 13

Let

$$\gamma _n = \max _{T\text { over }[n]}\min _{S\text { that covers }T} \frac{{|S|_{{\updownarrow }}}}{ex(T)}= \max _{T\text { over }[n]}\frac{cw(T)}{ex(T)}$$

then (weak conjecture) \(\gamma _n \in \mathrm {polylog}(n)\), or even (strong conjecture) \(\gamma _n\in \varTheta (1)\).

By the example Eq. (20) above, \(\gamma _n\) is at least \(\gamma _n\ge \gamma _5\ge 3/2\). We will update the full version of this paper as we get aware on progress on (dis)proving this conjecture. In the full version we also introduce another parameter shannon(w), which can give better lower bounds on the size of a state required to realize a given transcript in terms of Shannon entropy.

4.3 Bounding pROM Time Using Pebbling Time

We are ultimately interested in proving lower bounds on time and cumulative complexity in the parallel ROM model. We first show that pebbling time complexity implies time complexity in the pROM model, the reduction is optimal up to a factor \(\gamma _n\). Under conjecture 13, this basically answers the main open problem left in the Proofs of Space paper [9]. In the theorem below we need the label length w to in the order of \(m\log (n)\) to get a lower bound on \(|{\sigma _\mathsf{init}}|\). For the proofs of space application, where \(m=1\), this is a very weak requirement, but for scrypt, where \(m=n\), this means we require rather long labels (the number of queries q will be \(\le n^2\), so the \(\log (q)\) term can be ignored).

Theorem 14

Consider any \(G=(V,E),C\subseteq V,m\in \mathbb {N},\epsilon \ge 0\) and algorithm \({\mathtt {A}}\). Let \(n=|V|\) and \(\gamma _n\) be as in Conjecture 13. Let \(\mathcal{H}\) contain all functions \(\{0,1\}^*\rightarrow \{0,1\}^w\), then with probability \(1-2^{-\varDelta }\) over the choice of \(\mathsf{h}\leftarrow \mathcal{H}\) the following holds for every \({\sigma _\mathsf{init}}\in \{0,1\}^*\). Let q be an upper bound on the total number of \(\mathsf{h}\) queries made by \({\mathtt {A}}\) and let

$$ k=\frac{|{\sigma _\mathsf{init}}|+\varDelta }{ (w-m\log (n)-\log (q))} $$

(so \( |{\sigma _\mathsf{init}}|\approx k\cdot w\) for sufficiently large w), then

$$ \mathsf{time}^\texttt {pROM}(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h}) \ge \mathsf{time}(G,C,m,\lceil k\cdot \gamma _n\rceil ) $$

and for every \(1>\epsilon \ge 0\)

$$ \mathsf{time}^\texttt {pROM}_\epsilon (G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h}) \ge \mathsf{time}_\epsilon (G,C,m,\lceil k\cdot \gamma _n\rceil ) $$

In other words, if the initial state is roughly \(k\cdot w\) bits large (i.e., it’s sufficient to store k labels), then the pROM time complexity is as large as the pebbling time complexity of \(\mathsf{pebble}(G,C,m)\) for any initial pebbling of size \(k\cdot \gamma _n\). Note that the above theorem is basically tight up to the factor \(\gamma _n\): consider an experiment \(\mathsf{time}(G,C,m,\mathsf{P},{P_\mathsf{init}})\), then we can come up with a state \({\sigma _\mathsf{init}}\) of size \(k\cdot w\), namely \({\sigma _\mathsf{init}}=\{\ell _i,i\in {P_\mathsf{init}}\}\), and define \({\mathtt {A}}\) to mimic \(\mathsf{P}\), which then implies

$$\mathsf{time}^\texttt {pROM}_\epsilon (G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h}) = \mathsf{time}_\epsilon (G,C,m,\mathsf{P},{P_\mathsf{init}})\quad \text {with}\quad |{\sigma _\mathsf{init}}|=k\cdot w $$

in particular, if we let \(\mathsf{P},{P_\mathsf{init}}\) be the strategy and initial pebbling of size k minimising time complexity we get

$$\mathsf{time}^\texttt {pROM}_\epsilon (G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h}) \ge \mathsf{time}_\epsilon (G,C,m,k)\quad \text {with}\quad |{\sigma _\mathsf{init}}|=k\cdot w $$

Wlog. we will assume that \({\mathtt {A}}\) is deterministic (if \({\mathtt {A}}\) is probabilistic we can always fix some “optimal” coins). Below we prove two claims which imply Theorem 14.

Claim

With probability \(1-2^{-\varDelta }\) over the choice of \(\mathsf{h}\leftarrow \mathcal{H}\); If the transcript \(T(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})\) is k-extractable, then

$$\begin{aligned} |{\sigma _\mathsf{init}}|\ge k\cdot (w-m\log (n)-\log (q))-\varDelta \end{aligned}$$
(21)

where q is an upper bound on the total number of \(\mathsf{h}\) queries made by \({\mathtt {A}}\).

Proof

Let L be an upper bound on the length of queries made by \({\mathtt {A}}\), so we can assume that the input domain of \(\mathsf{h}\) is finite, i.e., \(\mathsf{h}:\{0,1\}^{\le L}\rightarrow \{0,1\}^w\). Let \(|\mathsf{h}|=2^L\cdot w\) denote the size of \(\mathsf{h}\)’s function table.

Let \(\ell _{i_1},\ldots ,\ell _{i_k}\) be the indices of the k labels (these must not be unique) that can be “extracted”, and let \(\mathsf{h}^{-}\) denote the function table of \(\mathsf{h}\), but where the rows are in a different order (to be defined), and the rows corresponding to the queries that output the labels to be extracted are missing, so \(|\mathsf{h}|-|\mathsf{h}^{-}|=k\cdot w\).

Given the state \({\sigma _\mathsf{init}}\), the function table of \(\mathsf{h}^{-}\) and some extra information \(\alpha \) discussed below, we can reconstruct the entire function table of \(\mathsf{h}\). As this table is uniform, and a uniform string of length s cannot be compressed below \(s-\varDelta \) bits except with probability \(2^{-\varDelta }\), we get that with probability \(1-2^{-\varDelta }\) Eq. (21) must hold, i.e.,

$$ |{\sigma _\mathsf{init}}|+|\mathsf{h}^-|+|\alpha |\ge |\mathsf{h}|-\varDelta $$

as \(|\mathsf{h}|-|\mathsf{h}^{-}|=k\cdot w\) we get

$$ |{\sigma _\mathsf{init}}|\ge k\cdot w - |\alpha |-\varDelta $$

It remains to define \(\alpha \) and the order in which the values in \(\mathsf{h}^-\) are stored. For every label to be extracted, we specify on what challenge sequence to run the adversary \({\mathtt {A}}\), and where exactly in this execution the label we want to extract appears (as part of a query made by \({\mathtt {A}}\)). This requires up to \(m\log (n)+\log (q)\) bits for every label to be extracted, so

$$ |\alpha |\le k\cdot (m\cdot \log (n)+\log (q)) $$

The first part of \(\mathsf{h}^{-}\) now contains the outputs of \(\mathsf{h}\) in the order in which they are requested by the extraction procedure just outlined (if a query is made twice, then we have to remember it and not simply use the next entry in \(\mathsf{h}^{-}\)). Let us stress thaw we only store the w bit long outputs, not the inputs, this is not a problem as we learn the corresponding inputs during the extraction procedure. The entries of \(\mathsf{h}\) which are not used in this process and are not extracted labels, make up the 2nd part of the \(\mathsf{h}^-\) table. As we know for which inputs we’re still missing the outputs, also here we just have to store the w bit long outputs such that the inputs are the still missing inputs in lexicographic order.

Let us mention that if \({\mathtt {A}}\) behaved nice in the sense that all its queries are on inputs which are actually required to compute the corresponding labels, then we would only need \(\log (n)\) bits extra information per label, namely the indices \(i_1,\ldots ,i_k\). But as \({\mathtt {A}}\) can behave arbitrarily, we can’t tell when \({\mathtt {A}}\) actually uses real labels as inputs or some junk, and thus must exactly specify where the real labels to be extracted show up.

Claim

If the transcript \(T=T(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})\) is k-extractable (i.e., \(ex(T)=k\)), then

$$\begin{aligned} \mathsf{time}^\texttt {pROM}(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h}) \ge \mathsf{time}(G,C,m,\lceil k\cdot \gamma _n\rceil ) \end{aligned}$$
(22)

and for any \(1>\epsilon \ge 0\)

$$\begin{aligned} \mathsf{time}^\texttt {pROM}_\epsilon (G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h}) \ge \mathsf{time}_\epsilon (G,C,m,\lceil k\cdot \gamma _n\rceil ) \end{aligned}$$
(23)

Proof

We will only prove the first statement Eq. (22). As T is k-extractable, there exist \((\mathsf{P},P^\updownarrow )\) where \(P^\updownarrow \) is of weight \(\le \lceil k\cdot \gamma _n\rceil \) such that

$$ \mathsf{time}^\updownarrow (G,C,m,\mathsf{P},P^\updownarrow )=\mathsf{time}^\texttt {pROM}(G,C,m,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h}) $$

The claim now follows as

$$\mathsf{time}^\updownarrow (G,C,m,\mathsf{P},P^\updownarrow )\ge \mathsf{time}^\updownarrow (G,C,m,\lceil k\cdot \gamma _n\rceil )= \mathsf{time}(G,C,m,\lceil k\cdot \gamma _n\rceil )$$

where the first inequality follows by definition (recall that \({|P^\updownarrow |_{{\updownarrow }}}\le \lceil k\cdot \gamma _n\rceil \)) and the second by Lemma 3 which states that for time complexity, entangled pebblings are not better than normal ones.

Theorem 14 follow directly from the two claims above.

4.4 The CMC of the Line Graph

Throughout this section \(L_n=(V,E), V=[n], E=\{(i,i+1)\ :\ i\in [n-1]\}\) denotes the path of length n, and the set of challenge nodes \(C=[n]\) contains all verticies. In Sect. 3 we showed that – with overwhelming probability over the choice of a function \(\mathsf{h}:\{0,1\}^*\rightarrow \{0,1\}^w\) – the cumulative parallel entangled pebbling complexity for pebbling n challenges on a path of length n is

$$ \mathsf{cc}^\updownarrow (L_n,C=[n],n,n)=\varOmega \left( {n^2}/{\log ^2(n)}\right) $$

this then implies a lower bound on the cumulative memory complexity in the pROM against the class \(\mathcal{A}^\updownarrow \) of adversaries which are only allowed to store “encoding” of labels.

$$ \mathsf{cmc}^{\texttt {pROM}}_{\mathcal{A}^\updownarrow }(L_n,C=[n],n,n)=\varOmega \left( w\cdot {n^2}/{\log ^2(n)}\right) $$

This strengthens previous lower bounds which only proved lower bounds for CC complexity, which then implied security against pROM adversaries that could only store plain labels. In the full version, we show that \(\mathsf{cc}^\updownarrow \) can be strictly lower than cc, thus, at least for some graphs, the ability to store encodings, not just plain labels, can decrease the complexity.

In this section we show a lower bound on \(\mathsf{cmc}^{\texttt {pROM}}(\textsf {G}, C,m)\), i.e., without making any restrictions on the algorithm. Our bound will again depend on the parameter \(\gamma _n\) from Conjecture 13. We only sketch the proof as it basically follows the proof of Theorem 4.

Theorem 15

For any \(n\in \mathbb {N}\), let \(L_n=(V=[n],E=\{(i,i+1)\ :\ i\in [n-1]\})\) be the line of length n and \(\gamma _n\) be as in Conjecture 13, and the label length \(w=\varOmega (n\log n)\), then

$$ \mathsf{cmc}^{\texttt {pROM}}(L_n,C=[n],n,{\sigma _\mathsf{init}})=\varOmega \left( w\cdot {n^2}/{\log ^2(n)\cdot \gamma _n}\right) $$

and for every \(\epsilon >0\)

$$ \mathsf{cmc}^{\texttt {pROM}}_\epsilon (L_n,C=[n],n,{\sigma _\mathsf{init}})=\varOmega _\epsilon \left( w\cdot {n^2}/{\log ^2(n)\cdot \gamma _n}\right) $$

Proof

(sketch). We consider the experiment \(\mathsf{computeLabel}(L_n,C,n,{\mathtt {A}},{\sigma _\mathsf{init}},\mathsf{h})\) for the \({\mathtt {A}}\) achieving the minimal \(\mathsf{cmc}^{\texttt {pROM}}\) complexity if \(\mathsf{h}\) is chosen at random (we can assume \({\mathtt {A}}\) is deterministic). Let \((\mathsf{P},{P_\mathsf{init}})\) be such that \(\mathsf{pebble^\updownarrow }(L_n,C,n,\mathsf{P},{P_\mathsf{init}})\) mimics (as defined above) this experiment. By Theorem 9, \(\mathsf{cc}^\updownarrow (L_n, C = [n], n,n) =\varOmega \left( {n^{2}}/{\log ^2(n)}\right) \), unfortunately – unlike for time complexity – we don’t see how this would directly imply a lower bound on \(\mathsf{cmc}^{\texttt {pROM}}\).

Fortunately, although Theorems 4 and 9 are about CC complexity, the proof is based on time complexity: At any timepoint the “potential” of the current state lower bounds the time required to pebble a random challenge, and if the potential is small, then the state has to be large (cf. eq.(16)).

For any \(0\le i\le n\) and \({{\varvec{c}}}\in C^i\) let \(\sigma _{{\varvec{c}}}\) denote the state in the experiment \(\mathsf{computeLabel}(L_n,C,n,{\mathtt {A}},{\sigma _\mathsf{init}}=\emptyset ,\mathsf{h})\) right after the i’th label has been computed by \({\mathtt {A}}\) and conditioned on the first i challenges being \({{\varvec{c}}}\) (as \({\mathtt {A}}\) is deterministic and we fixed the first i challenges, \(\sigma _{{\varvec{c}}}\) is well defined).

At this point, the remaining experiment is \(\mathsf{computeLabel}(L_n,C,n-i,{\mathtt {A}},\sigma _{{\varvec{c}}},\mathsf{h})\). Similarly, we let \(P_{{\varvec{c}}}\) denote the pebbling in the “mimicing” \(\mathsf{pebble^\updownarrow }(L_n,C,n-i,\mathsf{P},P_{{\varvec{c}}})\) experiment after \(\mathsf{P}\) has pebbled the challenge nodes \({{\varvec{c}}}\). Let \(P'_{{\varvec{c}}}\) be the entangled pebbling of the smallest possible weight such that there exists a \(\mathsf{P}'\) such that \(\mathsf{pebble^\updownarrow }(L_n,C,n-i,\mathsf{P},P_{{\varvec{c}}})\) and \(\mathsf{pebble^\updownarrow }(L_n,C,n-i,\mathsf{P}',P'_{{\varvec{c}}})\) make the same queries on all possible challenges.

The expected time complexity to pebble the \(i+1\)’th challenge in \(\mathsf{pebble^\updownarrow }(L_n,C,n-i,\mathsf{P}',P'_{{\varvec{c}}})\) – and thus also in \(\mathsf{computeLabel}(L_n,C,n-i,{\mathtt {A}},\sigma _{{\varvec{c}}},\mathsf{h})\) – is at least \(n/6 {|P'_{{\varvec{c}}}|_{{\updownarrow }}}\) by Eq. (16). And by Theorem 14, we can lower bound the size of the state \(\sigma _{{\varvec{c}}}\) as (assuming w is sufficiently large)

$$ |\sigma _{{\varvec{c}}}|\ge \varOmega (w\cdot {|P'_{{\varvec{c}}}|_{{\updownarrow }}}/\gamma _n) $$

The CC cost of computing the next (\(i+1\))th label in \(\mathsf{computeLabel}(L_n,C,n-i,{\mathtt {A}},\sigma _{{\varvec{c}}},\mathsf{h})\)if we assume that the state remains roughly around its initial size \(|\sigma _{{\varvec{c}}}|\) until the challenge is pebbled – is roughly (cf. the intuition for the expectation game given in Sect. 3)

$$ \frac{n}{2\cdot {|P'_{{\varvec{c}}}|_{{\updownarrow }}}} \cdot |\sigma _{{\varvec{c}}}|=\varOmega \left( \frac{n}{ {|P'_{{\varvec{c}}}|_{{\updownarrow }}}}\cdot \frac{ w\cdot {|P'_{{\varvec{c}}}|_{{\updownarrow }}}}{\gamma _n}\right) =\varOmega \left( \frac{n\cdot w}{\gamma _n }\right) $$

As there are n challenges, this would give an \(\varOmega (w\cdot {n^2}/{ \gamma _n})\) bound on the overall CC complexity. Of course the above assumption that the state size never decreases is not true in general, an adversary case always chose to drop most of the pebbles once the challenge is known.

Note that in the above argument we don’t actually use the size \(|\sigma _{{\varvec{c}}}|\) of the current state, but only argue using the potential of the lightest pebbling \(P'_{{\varvec{c}}}\) necessary to mimic the remaining experiment. Following the same argument as in Theorem 4 (in particular, using Lemma 8) one can show that for a \(1/\log (n)\) fraction of the challenges, the potential says within a \(\log (n)\) factor of its initial sizes. This argument will lose us a \(1/\log ^2(n)\) factor in the CC complexity, giving the claimed \(\varOmega \left( w\cdot {n^2}/{\log ^2(n)\cdot \gamma _n}\right) \) bound.