Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

There exist various information theoretic notions of entropy that quantify the “uncertainty” of a random variable. A variable X has k bits of Shannon entropy if it cannot be compressed below k bits. In cryptography we mostly consider min-entropy, where we say that X has k bits of min-entropy, denoted \({\mathbf {H}}_{\infty }\left( X\right) =k\), if for any x, \(\Pr [X=x]\le 2^{-k}\).

In a cryptographic context, we often have to deal with variables that only appear to have high entropy to computationally bounded observers. The most important case is pseudorandomness, where we say that \(X\in \{0,1\}^n\) is pseudorandom, if it cannot be distinguished from the uniform distribution over \(\{0,1\}^n\).

More generally, we say that \(X\in \{0,1\}^n\) has \(k\le n\) bits of HILL pseudoentropy [12], denoted \({\mathbf {H}}^{\mathsf{{HILL}}}_{\epsilon ,s}(X)=k\) if it cannot be distinguished from some Y with \({\mathbf {H}}_{\infty }\left( Y\right) =k\) by any circuit of size s with advantage \(>\epsilon \), note that we get pseudorandomness as a special case for \(k=n\). We refer to k as the quantity and to \((\epsilon ,s)\) as the quality of the entropy.

A weak notion of pseudoentropy called Metric pseudoentropy [3] often comes up in security proofs. This notion is defined like HILL, but with the quantifiers exchanged: We only require that for every distininguisher there exists a distribution \(Y,{\mathbf {H}}_{\infty }\left( Y\right) =k\) that fools this particular distinguisher (not one such Y to fool them all).

HILL pseudoentropy is named after the authors of the [12] paper where it was introduced as a tool for constructing a pseudorandom generator from any one-way function. Their construction and analysis was subsequently improved in a series of works [11, 13, 28]. A lower bound on the number of calls to the underlying one-way function was given by [14].Footnote 1 More recently HILL pseudoentropy has been used in many other applications like leakage-resilient cryptography [6, 17], deterministic encryption [7] and memory delegation [4].

The two most important types of tools we have to manipulate pseudoentropy are chain rules and transformations from one notion into another. Unfortunately, the known transformations and chain rules lose large factors in the quality of the entropy, which results in poor quantitative security bounds that can be achieved using these tools. In this paper we provide lower bounds, showing that unfortunately, the known results are tight (or almost tight for chain rules), at least when considering non-adaptive black-box reductions. Although black-box impossibility results have been overcome by non black-box constructions in the past [2], we find it hard to imagine how non black-box constructions or adaptivity could help in this setting. We believe that relative to the oracles we construct also adaptive reductions are impossible as adaptivity “obviously” is no of use, but proving this seems hard. Our results are summarized in Figs. 1 and 2.

Complexity of the Adversary. In order to prove a black-box separation, we will construct an oracle and prove the separation unconditionally relative to this oracle, i.e., assuming all parties have access to it. This then shows that any construction/proof circumventing or separation in the plain model cannot be relativizing, which in particular rules out all black-box constructions [1, 16].

In the discussion below we measure the complexity of adversaries only in terms of numbers of oracle queries. Of course, in the actual proof we also bound them in terms of circuit size. For our upper bounds the circuits will be of basically the same size as the number of oracle queries (so the number of oracle queries is a good indication of the actual size), whereas for the lower bounds, we can even consider circuits of exponential size, thus making the bounds stronger (basically, we just require that one cannot hard-code a large fraction of the function table of the oracle into the circuit).

Fig. 1.
figure 1

Transformations: our bound comparing to the state of art. Our Theorem 1, stating that a loss of in circuit size is necessary for black-box reductions that show how deterministic implies randomized metric entropy (if the advantage \(\epsilon '\) remains in the same order) requires \(\epsilon '= 2^{-O(n-k+1)}\) and thus \(\ln (1/\epsilon ')\in O(n-k+1)\), so there’s no contradiction between the transformations from [3, 25] and our lower bound (i.e., the blue term is smaller than the red one). (Color figure online)

Transformations. It is often easy to prove that a variable \(X\in \{0,1\}^n\) has so called Metric pseudoentropy against deterministic distinguishers, denoted \({\mathbf {H}}^{\mathsf{{Metric}}}_{\epsilon ,s}{}^{,\mathrm {det}\{0,1\}}(X)=k\). Unfortunately, this notion is usually too weak to be useful, as it only states that for every (deterministic, boolean) distinguisher, there exists some Y with \({\mathbf {H}}_{\infty }\left( Y\right) =k\) that fools this particular distinguisher, but one usually needs a single Y that fools all (randomised) distinguishers, this is captured by HILL pseudoentropy.

Barak et al. [3] show that any variable \(X\in \{0,1\}^n\) that has Metric entropy, also has the same amount of HILL entropy. Their proof uses the min-max theorem, and although it perseveres the amount k of entropy, the quality drops from \((\epsilon ,s)\) to \((2\epsilon , {\varOmega }(s\cdot \epsilon ^2/n))\). A slightly better bound \( \left( 2\epsilon , {\varOmega }(s\cdot \epsilon ^2/(n+1-k)) \right) \) (where again k is the amount of Metric entropy), was given recently in [25]. The argument uses the min-max theorem and some results on convex approximation in \(L_p\) spaces.

In Theorem 1 we show that this is optimal – up to a small factor \({\varTheta }((n-k+1)/\ln (1/\epsilon ))\) – as a loss of \({\varOmega }(\ln (1/\epsilon )/\epsilon ^2)\) in circuit size is necessary for any black-box reduction. Note that for sufficiently small \(\epsilon \in 2^{-{\varOmega }(n-k+1)}\) our bound even matches the positive result up to a small constant factor.

The high-level idea of our separation is as follows; We construct an oracle \({{\mathcal {O}}}\) and a variable \(X\in \{0,1\}^n\), such that relative to this oracle X can be distinguished from any variable Y with high min-entropy when we can make one randomized query, but for any deterministic distinguisher \({\mathsf {A}}\), we can find a Y with high min-entropy which \({\mathsf {A}}\) cannot distinguish from X.

To define \({{\mathcal {O}}}\), we first choose a uniformly random subset \(S\in \{0,1\}^n\) of size \(|S|=2^m\). Moreover we chose a sufficiently large set of boolean functions \(D_1(\cdot ),\ldots ,D_h(\cdot )\) as follows: for every \(x\in S\) we set \(D_i(x)=1\) with probability 1 / 2 and for every \(x\not \in S\), \(D_i(x)=1\) with probability \(1/2+\delta \).

Given any x, we can distinguish \(x\in S\) from \(x\not \in S\) with advantage \(\approx 2\delta \) by quering \(D_i(x)\) for a random i. This shows that X cannot have much more than \(\log (|S|)=m\) bits of HILL entropy (in fact, even probabilistic Metric entropy) as any variable Y with \({\mathbf {H}}_{\infty }\left( Y\right) \geqslant m+1\) has at least half of its support outside S, and thus can be distinguished with advantage \(\approx 2\delta /2=\delta \) with one query as just explained. Concretely (recall that in this informal discussion we measure size simply by the number of oracle queries).

$$\begin{aligned} {\mathbf {H}}^{\mathsf{{Metric}}}_{\delta ,1}{}^{,\mathrm {rand}\{0,1\}}(X)\leqslant m+1 \end{aligned}$$

On the other hand, if the adversary is allowed q deterministic queries, then intuitively, the best he can do is to query \(D_1(x),\ldots ,D_q(x)\) and guess that \(x\in S\) if less than a \(1/2+{\delta }/2\) fraction of the outputs is 1. But even if \(q=1/\delta ^2\), this strategy will fail with constant probability. Thus, we can choose a Y with large support outside S (and thus also high min-entropy) which will fool this adversary. This shows that X does have large Metric entropy against deterministic distinguishers, even if we allow the adversaries to run in time \(1/\delta ^2\), concretely, we show that

$$\begin{aligned} {\mathbf {H}}^{\mathsf{{Metric,\mathrm {det}\{0,1\}}}}_{{\varTheta }(\delta ),O(1/\delta ^2)}(X)\geqslant n - O(\log (1/\delta )) \end{aligned}$$

The Adversary. Let us stress that we show impossibility in the non-uniform setting, i.e., for any input length, the distinguisher circuit can depend arbitrarily on the oracle. Like in many non-uniform black-box separation results (including [19, 22, 24, 30, 31]), the type of adversaries for which we can rigorously prove the lower bound is not completely general, but the necessary restrictions seem “obviously” irrelevant. In particular, given some input x (where we must decide if \(x\in S\)), we only allow the adversary queries on input x. This doesn’t seem like a real restriction as the distribution of \(D_i(x')\) for any \(x'\ne x\) is independent of x, and thus seems useless (but such queries can be used to make the success probability of the adversary on different inputs correlated, and this causes a problem in the proof). Moreover, we assume the adversary makes his queries non-adaptively, i.e., it choses the indices \(i_1,\ldots ,i_q\) before seeing the outputs of the queries \(D_{i_1}(x),\ldots ,D_{i_q}(x)\). As the distribution of all the \(D_i\)’s is identical, this doesn’t seem like a relevant restriction either.

Fig. 2.
figure 2

Chain rules: our lower bounds comparing to the state of art. In the literature there are basically three approaches to prove a chain rule for HILL entropy. The first one reduces the problem to an efficient version of the dense model theorem [22], the second one uses the so called auxiliary input simulator [17], and the last one is by a convex optimization framework [21, 26]. The last approach yields a chain rule with a loss of \(\approx 2^{m}/\epsilon ^2\) in circuit size, where m is the length of leakage Z.

Chain Rules. Most (if not all) information theoretic entropy notions H(.) satisfy some kind of chain rule, which states that the entropy of a variable X, when conditioned on another variable Z, can decrease by at most the bitlength |Z| of Z, i.e., \(H(X|Z) \geqslant H(X)-|Z|\).

Such a chain rule also holds for some computational notions of entropy. For HILL entropy a chain rule was first proven in [6, 22] by a variant of the dense model theorem, and was improved by Fuller and Reyzin [8]. A different approach using a simulator was proposed in [17] and later improved by Vadhan and Zheng [29]. A unified approach, based on convex optimization techniques was proposed recently in [21, 26] achieving best bounds so far.

The “dense model theorem approach” [8] proceeds as follows: one shows that if X has k bits of HILL entropy, then X|Z has \(k-m\) (where \(Z\in \{0,1\}^m\)) bits of Metric entropy. In a second step one applies a Metric to HILL transformation, first proven by Barak et al. [3], to argue that X|Z has also large HILL. The first step loses a factor \(2^m\) in advantage, the second another \(2^{2m}\epsilon ^2\) in circuit size. Eventually, the loss in circuit size is \(2^{2m}/\epsilon ^2\) and the loss in advantage is \(2^{m}\) which measured in terms of the security ratio size/advantage gives a total loss of \(2^{m}/\epsilon ^2\).

A more direct “simulator” approach [29] loses only a multiplicative factor \(2^{m}/\epsilon ^2\) in circuit size (there’s also an additive \(1/\epsilon ^2\) term) but there is no loss in advantage. The additive term can be improved to only \(2^{m}\epsilon ^2\) as shown in [21, 26].

In this paper we show that a loss of \(2^m/\epsilon ^{}\) is necessary. Note that this still is a factor \(1/\epsilon ^{}\) away from the positive result. Our result as stated in Theorem 2 is a bit stronger as just outlined, as we show that the loss is necessary even if we only want a bound on the “relaxed” HILL entropy of X|Z (a notion weaker than standard HILL).

To prove our lower bound, we construct an oracle \({{\mathcal {O}}}(.)\), together with a joint distribution \((X,Z)\in \{0,1\}^{n}\times \{0,1\}^{m}\). We want X to have high HILL entropy relative to \({{\mathcal {O}}}(.)\), but when conditioning on Z it should decrease as much as possible (in quantity and quality).

We first consider the case \(m=1\), i.e., the conditional part Z is just one bit. For \(n \gg \ell \gg m=1\) the oracle \({{\mathcal {O}}}(.)\) and the distribution (XZ) is defined as follows. We sample (once and for all) two (disjoint) random subset \({\mathcal {X}}_0,{\mathcal {X}}_1\subseteq \{0,1\}^n\) of size \(|{\mathcal {X}}_0|=|{\mathcal {X}}_1|=2^{\ell -1}\), let \({\mathcal {X}}={\mathcal {X}}_0\cup {\mathcal {X}}_1\). The oracle \({{\mathcal {O}}}(.)\) on input x is defined as follows (below \(B_p\) denotes the Bernoulli distribution with parameter p, i.e., \(\Pr [b=1\ :\ b\leftarrow B_p]=p\)).

  • If \(x\in {\mathcal {X}}_0\) output a sample of \(B_{1/2+\delta }\).

  • If \(x\in {\mathcal {X}}_1\) output a sample of \(B_{1/2-\delta }\).

  • Otherwise, if \(x\not \in {\mathcal {X}}\), output a sample of \(B_{1/2}\).

Note that our oracle \({{\mathcal {O}}}(.)\) is probabilistic, but it can be “derandomized” as we’ll explain at the beginning of Sect. 4. The joint distribution (XZ) is sampled by first sampling a random bit \(Z\leftarrow \{0,1\}\) and then \(X\leftarrow {\mathcal {X}}_Z\).

Given a tuple (VZ), we can distinguish the case \(V=X\) from the case where \(V=Y\) for any Y with large support outside of \({\mathcal {X}}\) (X has min-entropy \(\ell \), so let’s say we take a variable Y with \({\mathbf {H}}_{\infty }\left( Y|Z\right) \geqslant \ell +1\) which will have at least half of its support outside \({\mathcal {X}}\)) with advantage \({\varTheta }(\delta )\) by quering \(\alpha \leftarrow {{\mathcal {O}}}(V,Z)\), and outputting \(\beta =\alpha \oplus Z\).

  • If \((V,Z)=(X,Z)\) then \(\Pr [\beta =1]=1/2+\delta \). To see this, consider the case \(Z=0\), then \(\Pr [\beta =1]=\Pr [\alpha =1]=\Pr [{{\mathcal {O}}}(X)=1]=1/2+\delta \).

  • If \((V,Z)=(Y,Z)\) then \(\Pr [\beta =1]= \Pr [Y\not \in {\mathcal {X}}](1/2)+\Pr [Y\in {\mathcal {X}}](1/2+\delta )\le 1/2+\delta /2\).

Therefore X|Z doesn’t have \(\ell +1\) bits of HILL entropy

$$\begin{aligned} {\mathbf {H}}^{\mathsf{{HILL}}}_{\delta /2, 1}(X|Z)< \ell +1 \end{aligned}$$

On the other hand, we claim that X (without Z but access to \({{\mathcal {O}}}(.)\)) cannot be distinguished from the uniform distribution over \(\{0,1\}^n\) with advantage \({\varTheta }(\delta )\) unless we allow the distinguisher \({\varOmega }(1/\delta )\) oracle queries (the hidden constant in \({\varTheta }(\delta )\) can be made arbitrary large by stetting the hidden constant in \({\varOmega }(1/\delta )\) small enough), i.e.,

$$\begin{aligned} {\mathbf {H}}^{\mathsf{{HILL}}}_{{\varTheta }(\delta ), {\varOmega }(1/\delta )}(X)=n \end{aligned}$$
(1)

To see why (1) holds, we first note that given some V, a single oracle query is useless to tell whether \(V=X\) or \(V=U_n\): although in the case where \(V=X\in {\mathcal {X}}_Z\) the output \({{\mathcal {O}}}(X)\) will have bias \(\delta \), one can’t decide in which direction the bias goes as Z is (unconditionally) pseudorandom. If we’re allowed in the order \(1/\delta ^2\) queries, we can distinguish X from \(U_n\) with constant advantage, as with \(1/\delta ^2\) samples one can distinguish the distribution \(B_{1/2+\delta }\) (or \(B_{1/2-\delta }\)) from \(B_{1/2}\) with constant advantage. If we just want \({\varTheta }(\delta )\) advantage, \({\varOmega }(1/\delta )\) samples are necessary, which proves (1). While it is easy to prove that for the coin with bias \(\delta \) one needs \(O\left( 1/\delta ^2\right) \) trials to achieve \(99\,\%\) of certainty, finding the number of trials for some confidence level in o(1) as in our case, is more challenging. We solve this problem by a tricky application of Renyi divergences Footnote 2 The statement of our “coin problem” with precise bounds is given in Lemma 3.

So far, we have only sketched the case \(m=1\). For \(m>1\), we define a random function \(\pi :\{0,1\}^n\rightarrow \{0,1\}^{m-1}\). The oracle now takes an extra \(m-1\) bit string j, and for \(x\in {\mathcal {X}}\), the output of \({{\mathcal {O}}}(x,j)\) only has bias \(\delta \) if \(\pi (x)=j\) (and outputs a uniform bit everywhere else). We define the joint distribution (XZ) by sampling \(X\leftarrow {\mathcal {X}}\), define \(Z'\) s.t. \(X\in {\mathcal {X}}_{Z'}\), and set \(Z=\pi (X)\Vert Z'\). Now, given Z, we can make one query \(\alpha \leftarrow {{\mathcal {O}}}(V,Z[1\ldots m-1])\) and output \(\beta =\alpha \oplus Z[m]\), where, as before, getting advantage \(\delta \) in distinguishing X from any Y with min-entropy \(\ge \ell +1\).

On the other hand, given some V (but no Z) it is now even harder to tell if \(V=X\) or \(V=Y\). Not only don’t we know in which direction the bias goes as before in the case \(m=1\) (this information is encoded in the last bit Z[m] of Z), but we also don’t know on which index \(\pi (V)\) (in the case \(V=X\)) we have to query the oracle to observe any bias at all. As there are \(2^{m-1}\) possible choices for \(\pi (V)\), this intuitively means we need \(2^{m-1}\) times as many samples as before to observe any bias, which generalises (1) to

$$\begin{aligned} {\mathbf {H}}^{\mathsf{{HILL}}}_{{\varTheta }(\delta ), {\varOmega }(2^{m-1}/\delta ^{})}(X)=n \end{aligned}$$

1.1 Some Implications of Our Lower Bounds

Leakage Resilient Cryptography. The chain rule for HILL entropy is a main technical tool used in several security proofs like the construction of leakage-resilient schemes [6, 20]. Here, the quantitative bound provided by the chain rule directly translates into the amount of leakage these constructions can tolerate. Our Theorem 2 implies a lower bound on the necessary security degradation for this proof technique. This degradation is, unfortunately, rather severe: even if we just leak \(m=1\) bit, we will lose a factor \(2^m/\epsilon \), which for a typical security parameter \(\epsilon =2^{-80}\) means a security degradation of “80 bits”.

Let us also mention that Theorem 2 answers a question raised by Fuller and Reyzin [8], showing that for any chain rule the simultaneous loss in quality and quantity is necessary,Footnote 3

Faking Auxiliary Inputs. [17, 27, 29] consider the question how efficiently one can “fake” auxiliary inputs. Concretely, given any joint distribution (XZ) with \(Z\in \{0,1\}^m\), construct an efficient simulator h s.t. (Xh(X)) is \((\epsilon ,s)\)-indistinguishable from (XZ). For example [29] gives a simulator h of complexity \(O\left( 2^{m}\epsilon ^2\cdot s\right) \) (plus additive terms independent of s). This result has found many applications in leakage-resilient crypto, complexity theory and zero-knowledge theory. The best known lower bound (assuming exponentially hard OWFs) is \({{\varOmega }}\left( \max (2^{ m},1/\epsilon \right) )\). Since the chain rule for relaxed HILL entropy follows by a simulator argument [17] with the same complexity loss, our Theorem 2 yields a better lower bound \({{\varOmega }}\left( 2^{m}/\epsilon \right) \) on the complexity of simulating auxiliary inputs.

Dense Model Theorem. The computational dense model theorem [22] says, roughly speaking, that dense subsets of pseudorandom distributions are computationally indistinguishable from true dense distributions. It has found applications including differential privacy, memory delegation, graph decompositions and additive combinatorics. It is well known that the worst-case chain rule for HILL-entropy is equivalent to the dense model theorem, as one can think of dense distributions as uniform distributions X given short leakage Z. For settings with constant density, which correspond to \(|Z|=O\left( 1\right) \), HILL and relaxed HILL entropy are equivalent [17]; moreover, the complexity loss in the chain rule is then equal to the cost of transforming Metric Entropy into HILL Entropy. Now our Theorem 1 implies a necessary loss in circuit size \({{\varOmega }}\left( 1/\epsilon ^2\right) \) if one wants \(\epsilon \)-indistinguishability. This way we reprove the tight lower bound due to Zhang [31] for constant densities.

2 Basic Definitions

Let \(X_1\) and \(X_2\) be two distributions over the same finite set. The statistical distance of \(X_1\) and \(X_2\) equals \(\mathrm {SD}\left( X_1 ; X_2 \right) = \frac{1}{2}\sum _{x}\left| \Pr [X_1=x] - \Pr [X_2=x]\right| \).

Definition 1

(Min-Entropy). A random variable X has min-entropy k, denoted by \({\mathbf {H}}_{\infty }\left( X\right) =k\), if \(\max _x\Pr [X=x]\le 2^{-k}\).

Definition 2

(Average conditional min-Entropy [5]). For a pair (XZ) of random variables, the average min-entropy of X conditioned on Z is

$$\begin{aligned} {\widetilde{{\mathbf {H}}}}_\infty (X|Z) = -\log {{\mathrm{{\mathbb {E}}}}}_{z\leftarrow Z}[\max _x\Pr [X=x|Z=z]] = -\log {{\mathrm{{\mathbb {E}}}}}_{z\leftarrow Z}[2^{-{\mathbf {H}}_{\infty }\left( X|Z=z\right) }] \end{aligned}$$

Distinguishers. We consider several classes of distinguishers. With \({\mathcal {D}}^{\mathsf {rand},\{0,1\}}_s\) we denote the class of randomized circuits of size at most s with boolean output (this is the standard non-uniform class of distinguishers considered in cryptographic definitions). The class \({\mathcal {D}}^{\mathsf {rand},[0,1]}_s\) is defined analogously, but with real valued output in [0, 1]. \({\mathcal {D}}^{\mathsf {det},\{0,1\}}_s,{\mathcal {D}}^{\mathsf {det},[0,1]}_s\) are defined as the corresponding classes for deterministic circuits. With \({\varDelta }^D(X;Y)=|{{\mathrm{{\mathbb {E}}}}}_X[D(X)]-{{\mathrm{{\mathbb {E}}}}}_Y[D(Y)]\) we denote D’s advantage in distinguishing X and Y.

Definition 3

(HILL pseudoentropy [12, 15]). A variable X has HILL entropy at least k if

$$\begin{aligned} {\mathbf {H}}^{\mathsf{{HILL}}}_{\epsilon ,s}(X)\ge k\iff \exists Y,\ {\mathbf {H}}_{\infty }\left( Y\right) =k \ \forall D\in {\mathcal {D}}^{\mathsf {rand},\{0,1\}}_s:{\varDelta }^D(X;Y)\le \epsilon \end{aligned}$$

For a joint distribution (XZ), we say that X has k bits conditonal Hill entropy (conditionned on Z) if

$$\begin{aligned}&{\mathbf {H}}^{\mathsf{{HILL}}}_{\epsilon ,s}(X|Z)\ge k\\\iff & {} \exists (Y,Z),{\widetilde{{\mathbf {H}}}}_\infty (Y|Z)=k \ \forall D\in {\mathcal {D}}^{\mathsf {rand},\{0,1\}}_s:{\varDelta }^D((X,Z);(Y,Z))\le \epsilon \end{aligned}$$

Definition 4

(Metric pseudoentropy [3]). A variable X has Metric entropy at least k if

$$\begin{aligned} {\mathbf {H}}^{\mathsf{{Metric}}}_{\epsilon ,s}(X)\ge k\iff \forall D\in {\mathcal {D}}^{\mathsf {rand},\{0,1\}}_s\exists Y_D\ ,\ {\mathbf {H}}_{\infty }\left( Y_D\right) =k \ :\ {\varDelta }^D(X;Y_D)\le \epsilon \end{aligned}$$

Metric star entropy is defined analogousely but using deterministic real valued distinguishers

$$\begin{aligned} {\mathbf {H}}^{{\textsf {Metric}}*}_{\epsilon ,s}(X)\ge k\iff \forall D\in {\mathcal {D}}^{\mathsf {det},[0,1]}_s\exists Y_D,\ {\mathbf {H}}_{\infty }\left( Y_D\right) =k :{\varDelta }^D(X;Y_D)\le \epsilon \end{aligned}$$

Relaxed Versions of HILL and Metric Entropy. A weaker notion of conditional HILL entropy allows the conditional part to be replaced by some computationally indistinguishable variable

Definition 5

(Relaxed HILL pseudoentropy [9, 23]). For a joint distribution (XZ) we say that X has relaxed HILL entropy k conditioned on Z if

$$\begin{aligned}&{\mathbf {H}}^{\mathsf{{HILL-rlx}}}_{\epsilon ,s}(X|Z)\ge k\\\iff & {} \exists (Y,Z'), {\widetilde{{\mathbf {H}}}}_\infty (Y|Z')=k, \forall D\in {\mathcal {D}}^{\mathsf {rand},\{0,1\}}_s,\ \ :\ {\varDelta }^D((X,Z);(Y,Z'))\le \epsilon \end{aligned}$$

The above notion of relaxed HILL satisfies a chain rule whereas the chain rule for the standard definition of conditional HILL entropy is known to be false [18]. One can analogously define relaxed variants of metric entropy, we won’t give these as they will not be required in this paper.

Pseudoentropy Against Different Distinguisher Classes. For randomized distinguishers, it’s irrelevant if the output is boolean or real values, as we can replace any \(D\in {\mathcal {D}}^{\mathsf {rand},[0,1]}_s\) with a \(D'\in {\mathcal {D}}^{\mathsf {rand},\{0,1\}}\) s.t. \({{\mathrm{{\mathbb {E}}}}}[D'(X)]={{\mathrm{{\mathbb {E}}}}}[D(X)]\) by setting (for any x) \(\Pr [D'(x)=1]={{\mathrm{{\mathbb {E}}}}}[D(x)]\). For HILL entropy (as well as for its relaxed version), it also doesn’t matter if we consider randomized or deterministic distinguishers in Definition 3, as we always can “fix” the randomness to an optimal value. This is no longer true for metric entropy,Footnote 4 and thus the distinction between metric and metric star entropy is crucial.

3 A Lower Bound on Metric-to-HILL Transformations

Theorem 1

For every n, k, m and \(\epsilon \) such that \(n \geqslant k + \log (1/\epsilon )+4\), \(\frac{1}{8}>\epsilon \) and \(n-1\ge m > 6\log (1/\epsilon )\) there exist an oracle \({{\mathcal {O}}}\) and a distribution X over \(\{0,1\}^n\) such that

$$\begin{aligned} {\mathbf {H}}^{\mathsf{{Metric}}}_{\epsilon ,T}{}^{,\mathrm {det}\{0,1\}}(X)\geqslant k \end{aligned}$$
(2)

here the complexity T denotes any circuit of size \(2^{O(m)}\) that makes at most \(\frac{\ln (2/\epsilon )}{216\epsilon ^2} \) non-adaptive queries and, simultaneously,

$$\begin{aligned} {\mathbf {H}}^{\mathsf{{Metric}}}_{2\epsilon ,T'}{}^{,\mathrm {rand}\{0,1\}}(X)\leqslant m+1 \end{aligned}$$
(3)

where the distinguishers size \(T'\) is only O(n) and the query complexity is 1.

Let S be a random subset of \(\{0,1\}^{n}\) of size \(2^{m}\), where \(m\leqslant n-1\), and let \(D_1,\ldots ,D_h\) be boolean functions drawn independently from the following distribution D: \(D(x)=1\) on S with probability p if \(x\in S\) and \(D(x)=1\) with probability q if \(x\in S^c\), where \(p>q\) and \(p+q=1\). Denote \(X=U_{S}\). We will argue that the metric entropy against a probabilistic adversary who is allowed one query is roughly m with advantage \({\varOmega }(p-q)\). But the metric entropy against non-adaptive deterministic adversary who can make t queries of the form \(D_i(x)\) is much bigger, even if \(t = O\left( (p-q)^{-2}\right) \). Let us sketch an informal argument before we give the actual proof. We need to prove two facts:

  1. (i)

    There is a probabilistic adversary \({\mathsf {A}}^{*}\) such that with high probability over \(X,D_1,\ldots ,D_h\) we have \({\varDelta }^{{\mathsf {A}}^{*}}(X,Y) = {\varOmega }( p-q)\) for all Y with \({\mathbf {H}}_{\infty }\left( Y\right) \geqslant m+1\).

  2. (ii)

    For every deterministic adversary \({\mathsf {A}}\) making at most \(t=O\left( (p-q)^{-2}\right) \) non-adaptive queries, with high probability over \(X,D_1,\ldots ,D_h\) we have \({\varDelta }^{{\mathsf {A}}}(X;Y)= 0\) for some Y with \({\mathbf {H}}_{\infty }\left( Y\right) =n-{\varTheta }(1)\).

To prove (i) we observe that the probabilistic adversary can distinguish between S and \(S^c\) by comparing the bias of ones. We simply let \({\mathsf {A}}^*\) forward its input to \(D_i\) for a randomly chosen i, i.e.,

$$\begin{aligned} {\mathsf {A}}^{*}(x) = D_i(x),\quad i\leftarrow [1,\ldots ,h] \end{aligned}$$

With extremely high probability we have \(\Pr [{\mathsf {A}}^{*}(x)=1] \in [p-\delta ,p+\delta ]\) if \(x\in S\) and \(\Pr [{\mathsf {A}}^{*}(x)=1] \in [q-\delta ,q+\delta ]\) if \(x\not \in S\) for some \(\delta \ll p-q\) (by a Chernoff bound, \(\delta \) drops exponentially fast in h, so we just have to set h large enough). We have then \(\Pr [{\mathsf {A}}^{*}(X)=1] \geqslant p+\delta \) and \(\Pr [{\mathsf {A}}^{*}(Y)=1] \leqslant 1/2\cdot ( p+q+2\delta )\) for every Y of min-entropy at least \(m+1\) (since then \(\Pr [Y\in S] \leqslant 1/2\)). This yields \({\varDelta }^{{\mathsf {A}}^{*}}(X;Y) = (p-q)/2\). In order to prove (ii) one might intuitively argue that the best a t-query deterministic adversary can do to contradict to (ii), is to guess whether some value x has bias p or \(q=1-p\), by taking the majority of t samples

$$\begin{aligned} {\mathsf {A}}(x) = \mathrm {Maj}(D_1(x),\ldots ,D_t(x)) \end{aligned}$$

But even if \(t={\varTheta }( 1/(p-q)^2)\), majority will fail to predict the bias with constant probability. This means there exists a variable Y with min-entropy \(n-{\varTheta }(1)\) such that \(\Pr [{\mathsf {A}}(Y)=1]=\Pr [{\mathsf {A}}(X)=1]\). The full proof gives quantitative forms of (i) and (ii), showing essentially that “majority is best” and appears in Appendix A.

4 Lower Bounds on Chain Rules

For any \(n\gg \ell \gg m\), we construct a distribution \((X,Z)\in \{0,1\}^n\times \{0,1\}^m\) and an oracle \({{\mathcal {O}}}(.)\) such that relative to this oracle, X has very large HILL entropy but the HILL entropy of X|Z is much lower in quantity and quality: for arbitrary \(n\gg \ell \gg m\) (where \(|Z|=m\), \(X\in \{0,1\}^n\)), the quantity drops from n to \(\ell -m+2\) (it particular, by much more than \(|Z|=m\)), even if we allow for a \(2^m/\epsilon \) drop in quality.

Theorem 2

(A lower bound on the chain rule for \({\mathbf {H}}^{\mathsf{{HILL-rlx}}}\) ). There exists a joint distribution (XZ) over \(\{0,1\}^n\times \{0,1\}^m\), and an oracle \({{\mathcal {O}}}\) such that, relative to \({{\mathcal {O}}}\), for any \((\ell ,\delta )\) such that \(\frac{n}{2} - \frac{\log (1/\delta )}{2} > m\) and \(\ell > m + 6\log (1/\delta ) \), we have

$$\begin{aligned} {\mathbf {H}}^{\mathsf{{HILL}}}_{{\delta }/{2},T}(X) = n \end{aligned}$$
(4)

whereFootnote 5 \(T > c\cdot 2^{m}/\delta \) with some absolute constant c but

$$\begin{aligned} {\mathbf {H}}^{\mathsf{{HILL-rlx}}}_{\delta /2,T'}(X|Z) < \ell +1 \end{aligned}$$
(5)

where \(T' \) captures a circuit of size only O(n) making only 1 oracle query.

Remark 1

(On the technical restrictions). Note that the assumptions on \(\ell \) and \(\delta \) are automatically satisfied in most interesting settings, as typically we assume \(m\ll n\) and \(\log (1/\delta ) \ll n\).

Remark 2

(A strict separation). The theorem also holds if we insist on a larger distinguishing advantage after leakage. Concretely, allowing for more than just one oracle query, the \(\delta /2\) advantage in (5) can be amplified to \(C\delta \) for any constant C assuming \(\delta \) is small enough to start with (see Remark 4 in the proof).

The full proof appears in Appendix B. The heart of the argument is a lower bound on the query complexity for the corresponding “coin problem”: we need to distinguish between T random bits, and the distribution where we sample equally likely T independent bits \(B_p\) or T independent bits \(B_q\) where \(p=\frac{1}{2}+\delta \) and \(q=1-p\). (see Appendix C for more details). The rest of the proof is based on a standard concentration argument, using extensively Chernoff Bounds.

5 Open Problems

As shown in Fig. 2, there remains a gap between the best proofs for the chain-rule, which lose a factor \(\epsilon ^2/2^{|Z|}\) in circuit size, and the required loss of \(\epsilon /2^{|Z|}\) we prove in this paper. Closing this bound by either improving the proof for the chain-rule or give an improved lower bound remains an intriguing open problem.

Our lower bounds are only proven for adversaries that make their queries non-adaptively. Adaptive queries don’t seem to help against our oracle, but rigorously proving this fact seems tricky.

Finally, the lower bounds we prove on the loss of circuit size assume that the distinguishing advantage remains roughly the same. There exist results which are not of this form, in particular – as shown in Fig. 2 – the HILL to Metric transformation from [8] only loses in distinguishing advantage, not in circuit size (i.e., we have \(s\approx s'\)). Proving lower bounds and giving constructions for different circuit size vs. distinguishing advantage trade-offs leave many challenges for future work.