Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

A central challenge in privacy research is to generate rich private summaries of a sensitive dataset. Doing so creates a tension between two competing goals. On one hand we would like to ensure differential privacy [22]—a strong notion of individual privacy that guarantees no individual’s data has a significant influence on the summary. On the other hand, the summary should enable a user to obtain approximate answers to some large set of queries. Since the summary must be generated without knowing which queries the user will need to answer, we would like Q to be very large. This problem is sometimes called non-interactive query release, in contrast with interactive query release where the user is required specify the (much smaller) set of queries that he needs to answer in advance, and the private answers may be tailored to just these queries.

More specifically, there is a sensitive dataset \(D = (D_1,\dots ,D_n) \in X^n\) where each element of D is the data of some individual, and comes from some data universe X. We are interested in generating a summary that allows the user to answer statistical queries on D, which are queries of the form “What fraction of the individuals in the dataset satisfy some property q?” [38]. Given a set of statistical queries Q and a data universe X, we would like to design a differentially private algorithm M that takes a dataset \(D \in X^n\) and outputs a summary that can be used to obtain an approximate answer to every query in Q. Since differential privacy requires hiding the information of single individuals, for a fixed (XQ) generating a private summary becomes easier as n becomes larger. The overarching goal is to find algorithms that are both private and accurate for X and Q as large as possible and n as small as possible.

Since differential privacy is a strong guarantee, a priori we might expect differentially private algorithms to be very limited. However, a seminal result of Blum et al. [7] showed how to generate a differentially private summary encoding answers to exponentially many queries. After a series of improvements and extensions [23, 26, 33,34,35, 41, 43, 49], we know that any set of queries Q over any universe X can be answered given a dataset of size \(n \gtrsim \sqrt{\log |X|} \cdot \log |Q|\) [35]. Thus, it is information-theoretically possible to answer huge sets of queries using a small dataset.

Unfortunately, all of these algorithms have running time \(\mathrm {poly}(n,|X|,|Q|)\), which can be exponential in the dimension of the dataset, and in the description of a query. For example if \(X = \{0,1\}^{d}\), so each individual’s data consists of d binary attributes, then the dataset has size nd but the running time will be at least \(2^d\). Thus, these algorithms are only efficient when both |X| and |Q| have polynomial size. There are computationally efficient algorithms when one of |Q| and |X| is very large, provided that the other is extremely small—at most \(n^{2-\varOmega (1)}\). Specifically, (1) the classical technique of perturbing the answer to each query with independent noise requires a dataset of size \(n \gtrsim \sqrt{|Q|}\) [6, 19, 22, 25], and (2) the folklore noisy histogram algorithm (see e.g. [52]) requires a dataset of size \(n \gtrsim \sqrt{|X| \cdot \log |Q|}\). Thus there are huge gaps between the power of information-theoretic and computationally efficient differentially private algorithms.

Beginning with the work of Dwork et al. [23], there has been a series of results giving evidence that this gap is inherent using a connection to traitor-tracing schemes [16]. The first such result by Dwork et al. [23] showed the first separation between efficient and inefficient differentially private algorithms, proving a polynomial-factor separation in sample complexity between the two cases assuming bilinear cryptography. Subsequently, Boneh and Zhandry [10] proved that, under the much stronger assumption that indistinguishability obfuscation (iO) exists, then for a worst-case family of statistical queries, there is no computationally efficient algorithm with \(\mathrm {poly}(\log |Q| + \log |X|)\) sample complexity. More recently, Kowalczyk et al. [40] strengthened these results to show that the two efficient algorithms mentioned above—independent perturbation and the noisy histogram—are optimal up to polynomial factors, also assuming iO.

These results give a relatively clean picture of the complexity of non-interactive differential privacy, but only if we assume the existence of iO. Recently, in his survey on the foundations of differential privacy [52], Vadhan posed it as an open question to prove hardness of non-interactive differential privacy using standard cryptographic assumptions. In this work, we resolve this open question by proving a strong hardness result for non-interactive differential privacy making only the standard assumption that one-way functions (OWF) exist.

Theorem 1

There is a sequence of pairs \(\{(X_\kappa , Q_\kappa ) \}_{\kappa \in \mathbb {N}}\) where

$$\begin{aligned} |X_\kappa | = 2^{2^{\mathrm {poly}(\log \log \kappa )}} = 2^{\kappa ^{o(1)}}, |Q_\kappa | = 2^{\kappa } \end{aligned}$$

such that, assuming the existence of one-way functions, for every polynomial \(n = n(\kappa )\), there is no polynomial time differentially private algorithm that takes a dataset \(D \in X_{\kappa }^n\) and outputs an accurate answer to every query in \(Q_\kappa \) up to an additive error of \(\pm 1/3\).

We remark that, in addition to removing the assumption of iO, Theorem 1 is actually stronger than that of Boneh and Zhandry [10], since the data universe size can be subexponential in \(\kappa \), even if we only make standard polynomial-time hardness assumptions. We leave it as an interesting open question to obtain quantitatively optimal hardness results matching (or even improving) those of [40] using standard assumptions. Table 1 summarizes existing hardness results as compared to our work.

Table 1. Comparison of Hardness Results for Offline Differentially Private Query Release. Each row corresponds to an informal statement of the form “If the assumption holds, then there is no general purpose differentially private algorithm that works when the data universe has size at least |X|, the number of queries is at least |Q|, and the size of the dataset is at most n.” All assumptions are polynomial-time hardness.

Like all of the aforementioned hardness results, the queries constructed in Theorem 1 are somewhat complex, and involve computing some cryptographic functionality. A major research direction in differential privacy has been to construct efficient non-interactive algorithms for specific large families of simple queries, or prove that this problem is hard. The main technique for constructing such algorithms has been to leverage efficient PAC learning algorithms. Specifically, a series of works [7, 32, 33, 36] have shown that an efficient PAC learning algorithm for a class of concepts related to Q can be used to obtain efficient differentially private algorithms for answering the queries Q. Thus, hardness results for differential privacy imply hardness results for PAC learning. However, it is relatively easy to show the hardness of PAC learning using just OWFs [42], and one can even show the hardness of learning simple concept classes (e.g. DNF formulae [17, 18]) by using more structured complexity assumptions. One roadblock to proving hardness results for privately answering simple families of queries is that, prior to our work, even proving hardness results for worst-case families of queries required using extremely powerful cryptographic primitives like iO, leaving little room to utilize more structured complexity assumptions to obtain hardness for simple queries. By proving hardness results for differential privacy using only the assumption of one-way functions, we believe our results are an important step towards proving hardness results for simpler families of queries.

Relationship to [31]. A concurrent and independent work by Goyal, Koppula, and Waters also shows how to prove hardness results for non-interactive differential privacy from weaker assumptions than iO. Specifically, they propose a new primitive called risky traitor tracing that has weaker security than standard traitor tracing, but is still strong enough to rule out the existence of computationally efficient differentially private algorithms, and construct such schemes under certain assumptions on composite-order bilinear maps. Unlike our work, their new primitive has applications outside of differential privacy. However, within the context of differential privacy, Theorem 1 is stronger than what they prove in two respects: (1) their bilinear-map assumptions are significantly stronger than our assumption of one-way functions, and (2) their hardness result requires a data universe of size \(|X_{\kappa }| = \exp (\kappa )\), rather than our result, which allows \(|X_{\kappa }| = \exp (\kappa ^{o(1)})\).

1.1 Techniques

Differential Privacy and Traitor-Tracing Schemes. Our results build on the connection between differentially private algorithms for answering statistical queries and traitor-tracing schemes, which was discovered by Dwork et al. [23]. Traitor-tracing schemes were introduced by Chor et al. [16] for the purpose of identifying pirates who violate copyright restrictions. Roughly speaking, a (fully collusion-resilient) traitor-tracing scheme allows a sender to generate keys for n users so that (1) the sender can broadcast encrypted messages that can be decrypted by any user, and (2) any efficient pirate decoder capable of decrypting messages can be traced to at least one of the users who contributed a key to it, even if an arbitrary coalition of the users combined their keys in an arbitrary efficient manner to construct the decoder.

Dwork et al. show that the existence of traitor-tracing schemes implies hardness results for differential privacy. Very informally, they argue as follows. Suppose a coalition of users takes their keys and builds a dataset \(D \in X^n\) where each element of the dataset contains one of their user keys. The family Q will contain a query \(q_c\) for each possible ciphertext c. The query \(q_c\) asks “What fraction of the elements (user keys) in D would decrypt the ciphertext c to the message 1?” Every user can decrypt, so if the sender encrypts a message \(b \in \{0,1\}\) as a ciphertext c, then every user will decrypt c to b. Thus, the answer to the statistical query \(q_c\) will be b. Now, suppose there were an efficient algorithm that outputs an accurate answer to each query \(q_c\) in Q. Then the coalition could use it to efficiently produce a summary of the dataset D that enables one to efficiently compute an approximate answer to every query \(q_c\), which would also allow one to efficiently decrypt the ciphertext. Such a summary can be viewed as an efficient pirate decoder, and thus the tracing algorithm can use the summary to trace one of the users in the coalition. However, if there is a way to identify one of the users in the dataset from the summary, then the summary is not private.

Hardness of Privacy from OWF. In order to instantiate this outline, we need a sufficiently good traitor-tracing scheme. Traitor-tracing schemes can be constructed from any functional encryption scheme for comparison functions [8]Footnote 1 This is a cryptographic scheme in which secret keys are associated with functions f and ciphertexts are associated with a message x, and decrypting the ciphertext with a secret key corresponding to f reveals f(x) and “nothing else.” In our application, the functions are of the form \(f_{z}\) where \(f_{z}(x) = 1\) if and only if \(x \ge z\) (as integers).

Using techniques from [40] (also closely related to arguments in [14]), we show that, in order to prove hardness results for differentially private algorithms it suffices to have a functional encryption scheme for comparison functions that is non-adaptively secure for just two ciphertexts and n secret keys. That is, if an adversary chooses to receive keys for n functions \(f_1,\dots ,f_n\), and ciphertexts for two messages \(x_1,x_2\), then he learns nothing more than \(\{f_i(x_1), f_i(x_2)\}_{i \in [n]}\). Moreover, the comparison functions only need to support inputs in \(\{0,1,\dots ,n\}\) (i.e. \(\log n\) bits). Lastly, it suffices for us to have a symmetric-key functional encryption scheme where both the encryption and key generation can require a private master secret key.

We then construct this type of functional encryption (FE) using the techniques of Gorbunov et al. [30] who constructed bounded-collusion FE from any public-key encryption. There are two important differences between the type of FE that we need and bounded-collusion FE in [30]: (1) we want a symmetric-key FE based on one-way functions (OWFs), whereas they constructed public-key FE using public-key encryption, (2) we want security for only 2 ciphertexts but many secret keys, whereas they achieved security for many ciphertexts but only a small number of secret keys. It turns out that their construction can be rather easily scaled down from the public-key to the symmetric-key setting by replacing public-key encryption with symmetric-key encryption (as previously observed by, e.g., [11]). Going from many ciphertexts and few secret keys to many secret keys and few ciphertexts essentially boils down to exchanging the role of secret keys and ciphertexts in their scheme, but this requires care. We give the full description and analysis of this construction. Lastly, we rely on one additional property: for the simple functions we consider with logarithmic input length, we can get a scheme where the ciphertext size is extremely small \(\kappa ^{o(1)}\), where \(\kappa \) is the security parameter, while being able to rely on standard polynomial hardness of OWFs. To do so, we replace the garbled circuits used in the construction of [30] with information-theoretic randomized encodings for simple functions and leverage the fact that we are in the more restricted nonadaptive secret-key setting. The resulting small ciphertext size allows us to get DP lower bounds even when the data universe is of size \(|X| = \exp (\kappa ^{o(1)})\).

We remark that Tang and Zhang [47] proved that any black-box construction of a traitor-tracing scheme from a random oracle must have either keys or ciphertexts of length \(n^{\varOmega (1)}\), provided that the scheme does not make calls to the random oracle when generating the user keys. Our construction uses one-way functions during key generation, and thus circumvents this barrier.

Why Two-Ciphertext Security? In the hardness reduction sketched above, the adversary for the functional encryption scheme will use the efficient differentially private algorithm to output some stateless program (the summary) that correctly decrypts ciphertexts for the functional encryption scheme (by approximately answering statistical queries). The crux of the proof is to use differential privacy to argue that the scheme must violate security of the functional encryption scheme by distinguishing encryptions of the messages x and \(x-1\) even if it does not possess a secret key for the function \(f_{x}\), which is the only function in the family of comparison functions that would give different output on these two messages, and therefore an adversary without this key should not be able to distinguish between these two messages.

Thus, in order to obtain a hardness result for differential privacy we need a functional encryption scheme with the following non-standard security definition: for every polynomial time adversary that obtains a set of secret keys corresponding to functions other than \(f_{x}\) and outputs some stateless program, with high probability that program has small advantage in distinguishing encryptions of x from \(x - 1\). Implicit in the work of Kowalczyk et al. [40] is a lemma that says that this property is satisfied by any functional encryption scheme that satisfies the standard notion of security for two messages. At a high level, security for one-message allow for the possibility that the adversary sometimes outputs a program with large positive advantage and sometimes outputs a program with large negative advantage, whereas two-message security bounds the average squared advantage, meaning that the advantage must be small with high probability. This argument is similar to one used by Dodis and Yu [20] in a completely different setting.

1.2 Additional Related Work

(Hardness of) Interactive Differential Privacy. Another area of focus is interactive differential privacy, where the mechanism gets the dataset D and a (relatively small) set of queries Q chosen by the analyst and must output answers to each query in Q. Most differentially private algorithms for answering a large number of arbitrary queries actually work in this setting [23, 26, 34], or even in a more challenging setting where the queries in Q arrive online and may be adaptively chosen. [33, 35, 43, 49]. Ullman [50] showed that, assuming one-way functions exist, there is no polynomial-time differentially private algorithm that takes a dataset \(D \in X^n\) and a set of \(\tilde{O}(n^2)\) arbitrary statistical queries and outputs an accurate answer to each of these queries. The hardness of interactive differential privacy has also been extended to a seemingly easier model of interactive data analysis [37, 45], which is closely related to differential privacy [3, 21], even though privacy is not an explicit requirement in that model. These results however do not give any specific set of queries Q that can be privately summarized information-theoretically but not by a computationally efficient algorithm, and thus do not solve the problem addressed in thus work.

The Complexity of Simple Statistical Queries. As mentioned above, a major open research direction is to design non-interactive differentially private algorithms for simple families of statistical queries. For example, there are polynomial time differentially private algorithms with polynomial sample complexity for summarizing point queries and threshold queries [5, 12], using an information-theoretically optimal number of samples. Another class of focus has been marginal queries [15, 24, 32, 36, 48]. A marginal query is defined on the data universe \(\{0,1\}^{\kappa }\). It is specified by a set of positions \(S \subseteq \left\{ 1,\dots ,\kappa \right\} \), and a pattern \(t \in \{0,1\}^{|S|}\) and asks “What fraction of elements of the dataset have each coordinate \(j \in S\) set to \(t_j\)?” Specifically, Thaler et al. [48], building on the work of Hardt et al. [36] gave an efficient differentially private algorithm for answering all marginal queries up to an additive error of \(\pm .01\) when the dataset is of size \(n \gtrsim 2^{\sqrt{\kappa }}\). If we assume sufficiently hard one-way functions exist, then Theorem 1 would show that these parameters are not achievable for an arbitrary set of queries. It remains a central open problem in differential privacy to either design an optimal computationally efficient algorithm for marginal queries or to give evidence that this problem is hard.

Hardness of Synthetic Data. There have been several other attempts to explain the accuracy vs. computation tradeoff in differential privacy by considering restricted classes of algorithms. For example, Ullman and Vadhan [51] (building on Dwork et al. [23]) show that, assuming one-way functions, no differentially private and computationally efficient algorithm that outputs a synthetic dataset can accurately answer even the very simple family of 2-way marginals. A synthetic dataset is a specific type of summary that is interchangeable with the real dataset—it is a set \(\hat{D} = (\hat{D}_1,\dots ,\hat{D}_{n}) \in X^{n}\) such that the answer to each query on \(\hat{D}\) is approximately the same as the answer to the same query on D. 2-way marginals are just the subset of marginal queries above where we only allow \(|S| \le 2\), and these queries capture the mean covariances of the attributes. This result is incomparable to ours, since it applies to a very small and simple family of statistical queries, but only applies to algorithms that output synthetic data.

Information-Theoretic Lower Bounds. A related line of work [1, 4, 13, 29, 46] uses ideas from fingerprinting codes [9] to prove information-theoretic lower bounds on the number of queries that can be answered by differentially private algorithms, and also devise realistic attacks against the privacy of algorithms that attempt to answer too many queries [27, 28]. Most relevant to this work is the result of [13] which says that if the size of the data universe is \(2^{n^2}\), then there is a fixed set of \(n^2\) queries that no differentially private algorithm, even a computationally unbounded one, can answer accurately. Although these results are orthogonal to ours, the techniques are quite related, as fingerprinting codes are essentially the information-theoretic analogue of traitor-tracing schemes.

2 Differential Privacy Preliminaries

2.1 Differentially Private Algorithms

A dataset \(D \in X^{n}\) is an ordered set of n rows, where each row corresponds to an individual, and each row is an element of some data universe \(X\). We write \(D = (D_1,\dots ,D_n)\) where \(D_i\) is the i-th row of D. We will refer to n as the size of the dataset. We say that two datasets \(D, D' \in X^*\) are adjacent if \(D'\) can be obtained from D by the addition, removal, or substitution of a single row, and we denote this relation by \(D \sim D'\). In particular, if we remove the i-th row of D then we obtain a new dataset \(D_{-i} \sim D\). Informally, an algorithm A is differentially private if it is randomized and for any two adjacent datasets \(D \sim D'\), the distributions of A(D) and \(A(D')\) are similar.

Definition 1

(Differential Privacy [22]). Let \(A :X^n \rightarrow S\) be a randomized algorithm. We say that A is \((\varepsilon , \delta )\)-differentially private if for every two adjacent datasets \(D \sim D'\) and every \(E \subseteq S\),

$$\begin{aligned} { \mathbb {P} \left[ A(D) \in E\right] } \le e^{\varepsilon } \cdot { \mathbb {P} \left[ A(D') \in E\right] } + \delta . \end{aligned}$$

In this definition, \(\varepsilon , \delta \) may be functions of n.

2.2 Algorithms for Answering Statistical Queries

In this work we study algorithms that answer statistical queries (which are also sometimes called counting queries, predicate queries, or linear queries in the literature). For a data universe \(X\), a statistical query on \(X\) is defined by a predicate \(q :X\rightarrow \{0,1\}\). Abusing notation, we define the evaluation of a query q on a dataset \(D = (D_1,\dots ,D_n) \in X^n\) to be

$$\begin{aligned} \frac{1}{n} \sum _{i=1}^{n} q(D_i). \end{aligned}$$

A single statistical query does not provide much useful information about the dataset. However, a sufficiently large and rich set of statistical queries is sufficient to implement many natural machine learning and data mining algorithms [38], thus we are interested in differentially private algorithms to answer such sets. To this end, let \(Q = \left\{ q :X\rightarrow \{0,1\}\right\} \) be a set of statistical queries on a data universe \(X\).

Informally, we say that a mechanism is accurate for a set Q of statistical queries if it answers every query in the family to within error \(\pm \alpha \) for some suitable choice of \(\alpha > 0\). Note that \(0 \le q(D) \le 1\), so this definition of accuracy is meaningful when \(\alpha < 1/2\).

Before we define accuracy, we note that the mechanism may represent its answer in any form. That is, the mechanism outputs may output a summary \(S \in \mathcal {S}\) that somehow represents the answers to every query in Q. We then require that there is an evaluator \( Eval :\mathcal {S}\times \mathcal {Q}\rightarrow [0,1]\) that takes the summary and a query and outputs an approximate answer to that query. That is, we think of \( Eval (S,q)\) as the mechanism’s answer to the query q. We will abuse notation and simply write q(S) to mean \( Eval (S,q)\).Footnote 2

Definition 2

(Accuracy). For a family Q of statistical queries on \(X\), a dataset \(D \in X^n\) and a summary \(S \in \mathcal {S}\), we say that S is \(\alpha \)-accurate for Q on D if

$$\begin{aligned} \forall q \in Q \qquad \left| q(D) - q(S) \right| \le \alpha . \end{aligned}$$

For a family of statistical queries Q on \(X\), we say that an algorithm \(A :X^n \rightarrow S\) is \((\alpha , \beta )\)-accurate for Q given a dataset of size n if for every \(D \in X^n\),

$$\begin{aligned} { \mathbb {P} \left[ A(D)\,\, \text {is}\,\, \alpha \text {-accurate for}\,\, Q\,\, \text {on}\,\, X\right] }\,\, \ge 1-\beta . \end{aligned}$$

In this work we are typically interested in mechanisms that satisfy the very weak notion of \((1 \slash 3, o(1 \slash n))\)-accuracy, where the constant \(1 \slash 3\) could be replaced with any constant \(< \!\! 1/2\). Most differentially private mechanisms satisfy quantitatively much stronger accuracy guarantees. Since we are proving hardness results, this choice of parameters makes our results stronger.

2.3 Computational Efficiency

Since we are interested in asymptotic efficiency, we introduce a computation parameter \(\kappa \in \mathbb {N}\). We then consider a sequence of pairs \(\{ (X_{\kappa }, Q_{\kappa }) \}_{\kappa \in \mathbb {N}}\) where \(Q_{\kappa }\) is a set of statistical queries on \(X_{\kappa }\). We consider databases of size n where \(n = n(\kappa )\) is a polynomial. We then consider algorithms A that take as input a dataset \(X_{\kappa }^n\) and output a summary in \(S_{\kappa }\) where \(\{ S_{\kappa } \}_{\kappa \in \mathbb {N}}\) is a sequence of output ranges. There is an associated evaluator \(\mathsf {Eval}\) that takes a query \(q \in Q_{\kappa }\) and a summary \(s \in S_{\kappa }\) and outputs a real-valued answer. The definitions of differential privacy and accuracy extend straightforwardly to such sequences.

We say that such an algorithm is computationally efficient if the running time of the algorithm and the associated evaluator run in time polynomial in the computation parameter \(\kappa \). In principle, it could require as many as |X| bits even to specify a statistical query, in which case we cannot hope to answer the query efficiently, even ignoring privacy constraints. Thus, we restrict attention to statistical queries that are specified by a circuit of size \(\mathrm {polylog}|X|\), and thus can be evaluated in time \(\mathrm {polylog}|X|\), and so are not the bottleneck in computation. To remind the reader of this fact, we will often say that Q is a family of efficiently computable statistical queries.

2.4 Notational Conventions

Given a boolean predicate P, we will write \(\mathbb {I}\{P\}\) to denote the value 1 if P is true and 0 if P is false. We also say that a function \(\varepsilon = \varepsilon (n)\) is negligible if \(\varepsilon (n) = O(1/n^c)\) for every constant \(c > 0\), and denote this by \(\varepsilon (n) = \mathrm {negl}(n)\).

3 Weakly Secure Traitor-Tracing Schemes

In this section we describe a very relaxed notion of traitor-tracing schemes whose existence will imply the hardness of differentially private data release.

3.1 Syntax and Correctness

For a function \(n :\mathbb {N}\rightarrow \mathbb {N}\) and a sequence \(\{ K_\kappa , C_\kappa \}_{\kappa \in \mathbb {N}}\), an \((n, \{K_\kappa , C_\kappa \})\)-traitor-tracing scheme is a tuple of efficient algorithms \(\varPi = (\mathsf {Setup}, \mathsf {Enc}, \mathsf {Dec})\) with the following syntax.

  • \(\mathsf {Setup}\) takes as input a security parameter \(\kappa \), runs in time \(\mathrm {poly}(\kappa )\), and outputs \(n = n(\kappa )\) secret user keys \(\mathsf {sk}_1,\dots ,\mathsf {sk}_n \in K_\kappa \) and a secret master key \(\mathsf {msk}\). We will write to denote the set of keys.

  • \(\mathsf {Enc}\) takes as input a master key \(\mathsf {msk}\) and an index \(i \in \left\{ 0,1,\dots ,n\right\} \), and outputs a ciphertext \(c \in C_\kappa \). If \(c \leftarrow _\mathrm{{\tiny R}}\mathsf {Enc}(j, \mathsf {msk})\) then we say that c is encrypted to index j.

  • \(\mathsf {Dec}\) takes as input a ciphertext c and a user key \(\mathsf {sk}_i\) and outputs a single bit \(b \in \{0,1\}\). We assume for simplicity that \(\mathsf {Dec}\) is deterministic.

Correctness of the scheme asserts that if are generated by \(\mathsf {Setup}\), then for any pair ij, \(\mathsf {Dec}(\mathsf {sk}_i, \mathsf {Enc}(\mathsf {msk}, j)) = \mathbb {I}\{i \le j\}\). For simplicity, we require that this property holds with probability 1 over the coins of \(\mathsf {Setup}\) and \(\mathsf {Enc}\), although it would not affect our results substantively if we required only correctness with high probability.

Definition 3

(Perfect Correctness). An \((n, \{K_\kappa , C_\kappa \})\)-traitor-tracing scheme is perfectly correct if for every \(\kappa \in \mathbb {N}\), and every \(i,j \in \left\{ 0,1,\dots ,n\right\} \)

3.2 Index-Hiding Security

Intuitively, the security property we want is that any computationally efficient adversary who is missing one of the user keys \(\mathsf {sk}_{i^*}\) cannot distinguish ciphertexts encrypted with index \(i^*\) from index \(i^*-1\), even if that adversary holds all \(n-1\) other keys \(\mathsf {sk}_{-i^*}\). In other words, an efficient adversary cannot infer anything about the encrypted index beyond what is implied by the correctness of decryption and the set of keys he holds.

More precisely, consider the following two-phase experiment. First the adversary is given every key except for \(\mathsf {sk}_{i^*}\), and outputs a decryption program S. Then, a challenge ciphertext is encrypted to either \(i^*\) or to \(i^*-1\). We say that the traitor-tracing scheme is secure if for every polynomial time adversary, with high probability over the setup and the decryption program chosen by the adversary, the decryption program has small advantage in distinguishing the two possible indices.

Definition 4

(Index Hiding). A traitor-tracing scheme \(\varPi \) satisfies (weak) index-hiding security if for every sufficiently large \(\kappa \in \mathbb {N}\), every \(i^* \in [n(\kappa )],\) and every \(\mathrm {poly}(\kappa )\)-time adversary A,

(1)

In the above, the inner probabilities are taken over the coins of \(\mathsf {Enc}\) and S.

Note that in the above definition we have fixed the success probability of the adversary for simplicity. Moreover, we have fixed these probabilities to relatively large ones. Requiring only a polynomially small advantage is crucial to achieving the key and ciphertext lengths we need to obtain our results, while still being sufficient to establish the hardness of differential privacy.

3.3 Index-Hiding Security Implies Hardness for Differential Privacy

It was shown by Kowalczyk et al. [40] (refining similar results from [23, 50]) that a traitor-tracing scheme satisfying index-hiding security implies a hardness result for non-interactive differential privacy.

Theorem 2

Suppose there is an \((n, \{K_\kappa , C_\kappa \})\)-traitor-tracing scheme that satisfies perfect correctness (Definition 3) and index-hiding security (Definition 4). Then there is a sequence of pairs \(\{X_\kappa , Q_\kappa \}_{\kappa \in \mathbb {N}}\) where \(Q_\kappa \) is a set of statistical queries on \(X_\kappa \), \(|Q_\kappa | = |C_\kappa |\), and \(|X_\kappa | = |K_\kappa |\) such that there is no algorithm A that is simultaneously

  1. 1.

    computationally efficient,

  2. 2.

    \((1, 1 \slash 4n)\)-differentially private, and

  3. 3.

    \((1\slash 3, 1 \slash 2n)\)-accurate for \(Q_\kappa \) on datasets \(D \in X_{\kappa }^{n(\kappa )}\).

3.4 Two-Index-Hiding-Security

While Definition 4 is the most natural to prove hardness of privacy, it is not consistent with the usual security definition for functional encryption because of the nested “probability-of-probabilities.” In order to apply more standard notions of functional encryption, we show that index-hiding security follows from a more natural form of security for two ciphertexts.

First, consider the following \(\mathrm {IndexHiding}\) game (Fig. 1).

Fig. 1.
figure 1

\(\mathrm {IndexHiding}_{i^*}\)

Let be the game \(\mathrm {IndexHiding}_{I^*}\) where we fix the choices of and S. Also, define

so that

Then the following statement implies (1) in Definition 4:

(2)

We can define a related two-index-hiding game.

Fig. 2.
figure 2

\(\mathrm {TwoIndexHiding}_{i^*}\)

Analogous to what we did with \(\mathrm {IndexHiding}\), we can define to be the game \(\mathrm {TwoIndexHiding}_{i^*}\) where we fix and S, and define

$$\begin{aligned} \mathrm {TwoAdv}_{i^*} ={}&{\underset{\mathrm {TwoIndexHiding}_{i^*}}{\mathbb {P}} \left[ b' = b_0 \oplus b_1\right] } - \frac{1}{2} \end{aligned}$$

Kowalczyk et al. [40] proved the following lemma that will be useful to connect our new construction to the type of security definition that implies hardness of differential privacy.

Lemma 1

Let \(\varPi \) be a traitor-tracing scheme such that for every efficient adversary A, every \(\kappa \in \mathbb {N}\), and index \(i^* \in [n(\kappa )],\)

$$\begin{aligned} \mathrm {TwoAdv}_{i^*} \le \frac{1}{300 n^3} \end{aligned}$$

Then \(\varPi \) satisfies weak index-hiding security.

In the rest of the paper, we will construct a scheme satisfying the assumption of the above lemma with suitable key and ciphertext lengths, which we can immediately plug into Theorem 2 to obtain Theorem 1 in the introduction.

4 Cryptographic Tools

4.1 Decomposable Randomized Encodings

Let \(\mathcal {F}= \left\{ f:\{0,1\}^{\ell } \rightarrow \{0,1\}^{k}\right\} \) be a family of Boolean functions. An (information-theoretic) decomposable randomized encoding for \(\mathcal {F}\) is a pair of efficient algorithms \((\mathsf {\mathsf {DRE}{.}Encode}, \mathsf {\mathsf {DRE}{.}Decode})\) such that the following hold:

  • \(\mathsf {\mathsf {DRE}{.}Encode}\) takes as input a function \(f\in \mathcal {F}\) and randomness R and outputs a randomized encoding consisting of a set of \(\ell \) pairs of labels

    $$\begin{aligned} \tilde{F}(f, R) = \left\{ \begin{array}{ccc} \tilde{F}_{1}(f, 0, R) &{} \cdots &{} \tilde{F}_{\ell }(f, 0, R) \\ \tilde{F}_{1}(f, 1, R) &{} \cdots &{} \tilde{F}_{\ell }(f, 1, R) \end{array} \right\} \end{aligned}$$

    where the i-th pair of labels corresponds to the i-th bit of the input \(x\).

  • (Correctness) \(\mathsf {\mathsf {DRE}{.}Decode}\) takes as input a set of \(\ell \) labels corresponding to some function \(f\) and input \(x\) and outputs \(f(x)\). Specifically,

    $$\begin{aligned} \forall ~f\in \mathcal {F},~x\in \{0,1\}^{\ell } \qquad \mathsf {\mathsf {DRE}{.}Decode}\big (\tilde{F}_{1}(f, x_1, R), \dots ,\tilde{F}_{\ell }(f, x_{\ell }, R)\big ) = f(x) \end{aligned}$$

    with probability 1 over the randomness R.

  • (Information-Theoretic Security) For every function \(f\) and input y, the set of labels corresponding to \(f\) and y reveal nothing other than \(f(y)\). Specifically, there exists a randomized simulator \(\mathsf {\mathsf {DRE}{.}Sim}\) that depends only on the output \(f(x)\) such that

    $$\begin{aligned} \forall ~f\in \mathcal {F},~x\in \{0,1\}^{\ell }~~~~\left\{ \tilde{F}_{1}(f, x_1, R), \dots , \tilde{F}_{\ell }(f, x_{\ell }, R)\right\} \sim \mathsf {\mathsf {DRE}{.}Sim}(f(x)) \end{aligned}$$

    where \(\sim \) denotes that the two random variables are identically distributed.

  • The length of the randomized encoding is the maximum length of \(\tilde{F}(f, R)\) over all choices of \(f\in \mathcal {F}\) and the randomness R.

We will utilize the fact that functions computable in low depth have small decomposable randomized encodings.

Theorem 3

([2, 39]). If \(\mathcal {F}\) is a family of functions such that a universal function for \(\mathcal {F}\), \(U(f, x) = f(x)\), can be computed by Boolean formulae of depth d (with fan-in 2, over the basis \(\{\wedge , \vee , \lnot \}\)), then \(\mathcal {F}\) has an information-theoretic decomposable randomized encoding of length \(O(4^d)\).

4.2 Private Key Functional Encryption

Let \(\mathcal {F}= \left\{ f:\{0,1\}^{\ell } \rightarrow \{0,1\}^{k}\right\} \) be a family of functions. A private key functional encryption scheme for \(\mathcal {F}\) is a tuple of polynomial-time algorithms \(\varPi _{ FE }= (\mathsf {FE{.}Setup}, \mathsf {FE{.}KeyGen}, \mathsf {FE{.}Enc}, \mathsf {FE{.}Dec})\) with the following syntax and properties:

  • \(\mathsf {FE{.}Setup}\) takes a security parameter \(1^{\kappa }\) and outputs a master secret key \(\mathsf {FE{.}msk}\).

  • \(\mathsf {FE{.}KeyGen}\) takes a master secret key \(\mathsf {FE{.}msk}\) and a function \(f\in \mathcal {F}\) and outputs a secret key \(\mathsf {FE{.}sk}_{f}\) corresponding to the function \(f\).

  • \(\mathsf {FE{.}Enc}\) takes the master secret key \(\mathsf {FE{.}msk}\) and an input \(x\in \{0,1\}^\ell \) and outputs a ciphertext \(c\) corresponding to the input \(x\).

  • (Correctness) \(\mathsf {FE{.}Dec}\) takes a secret key \(\mathsf {FE{.}sk}\) corresponding to a function \(f\) and a ciphertext \(c\) corresponding to an input \(x\) and outputs \(f(x)\). Specifically, for every \(\mathsf {FE{.}msk}\) is in the support of \(\mathsf {FE{.}Setup}\)

    $$\begin{aligned} \mathsf {FE{.}Dec}( \mathsf {FE{.}KeyGen}(\mathsf {FE{.}msk}, f), \mathsf {FE{.}Enc}(\mathsf {FE{.}msk}, x) ) = f(x) \end{aligned}$$
  • The key length is the maximum length of \(\mathsf {FE{.}sk}\) over all choices of \(f\in \mathcal {F}\) and the randomness of \(\mathsf {FE{.}Setup}, \mathsf {FE{.}Enc}\). The ciphertext length is the maximum length of \(c\) over all choices of \(x\in \{0,1\}^{\ell }\) and the randomness of \(\mathsf {FE{.}Setup}, \mathsf {FE{.}Enc}\).

  • (Security) We will use a non-adaptive simulation-based definition of security. In particular, we are interested in security for a large number of keys n and a small number of ciphertexts m. We define security through the pair of games in Fig. 3. We say that \(\varPi _{ FE }\) is \((n,m,\varepsilon )\)-secure if there exists a polynomial-time simulator \(\mathsf {FE{.}Sim}\) such that for every polynomial-time adversary \(\mathcal {A}\) and every \(\kappa \),

    $$\begin{aligned} \left| { \mathbb {P} \left[ E_{\kappa ,n,m}^{\mathrm {real}}(\varPi _{ FE }, \mathcal {A}) = 1\right] } - { \mathbb {P} \left[ E_{\kappa ,n,m}^{\mathrm {ideal}}(\varPi _{ FE }, \mathcal {A}, \mathsf {FE{.}Sim}) = 1\right] } \right| \le \varepsilon (\kappa ) \end{aligned}$$
Fig. 3.
figure 3

Security of functional encryption

Our goal is to construct a functional encryption scheme that is \((n,2,\frac{1}{300n^3})\)-secure and has short ciphertexts and keys, where \(n = n(\kappa )\) is a polynomial in the security parameter. Although it is not difficult to see, in Sect. 7 we prove that the definition of security above implies the definition of two-index-hiding security that we use in Lemma 1.

Function-Hiding Functional Encryption. As an ingredient in our construction we also need a notion of function-hiding security for a (one-message) functional encryption scheme. Since we will only need this definition for a single message, we will specialize to that case in order to simplify notation. We say that \(\varPi _{ FE }\) is function-hiding \((n,1,\varepsilon )\)-secure if there exists a polynomial-time simulator \(\mathsf {FE{.}Sim}\) such that for every polynomial-time adversary \(\mathcal {A}\) and every \(\kappa \),

$$\begin{aligned} \left| { \mathbb {P} \left[ \bar{E}_{\kappa ,n,1}^{\mathrm {real}}(\varPi _{ FE }, \mathcal {A}) = 1\right] } - { \mathbb {P} \left[ \bar{E}_{\kappa ,n,1}^{\mathrm {ideal}}(\varPi _{ FE }, \mathcal {A}, \mathsf {FE{.}Sim}) = 1\right] } \right| \le \varepsilon (\kappa ) \end{aligned}$$

where \(\bar{E}_{\kappa ,n,1}^{\mathrm {real}}, \bar{E}_{\kappa ,n,1}^{\mathrm {ideal}}\) are the same experiments as \(E_{\kappa ,n,1}^{\mathrm {real}}, E_{\kappa ,n,1}^{\mathrm {ideal}}\) except that the simulator in \(\bar{E}_{\kappa ,n,1}^{\mathrm {ideal}}\) is not given the functions \(f_i\) as input. Namely, in \(\bar{E}_{\kappa ,n,1}^{\mathrm {ideal}}\):

$$\begin{aligned} \left( \left\{ \mathsf {FE{.}sk}_{f_i}\right\} _{i=1}^{n}, \; c\right) \leftarrow _\mathrm{{\tiny R}}\mathsf {FE{.}Sim}\left( \left\{ f_i(x)\right\} _{i=1}^{n} \right) \end{aligned}$$

A main ingredient in the construction will be a function-hiding functional encryption scheme that is \((n,1,\mathrm {negl}(\kappa ))\)-secure. The construction is a small variant of the constructions of Sahai and Seyalioglu [44] and Gorbunov et al. [30]

Theorem 4

(Variant of [30, 44]). Let \(\mathcal {F}\) be a family of functions such that a universal function for \(\mathcal {F}\) has a decomposable randomized encoding of length L. That is, the function \(U(f, x) = f(x)\) has a \( DRE \) of length L. If one-way functions exist, then for any polynomial \(n = n(\kappa )\) there is an \((n,1,\mathrm {negl}(\kappa ))\)-function-hiding-secure functional encryption scheme \(\varPi \) with key length L and ciphertext length \(O(\kappa L)\).

Although this theorem follows in a relatively straightforward way from the techniques of [30, 44], we will give a proof of this theorem in Sect. 5. The main novelty in the theorem is to verify that in settings where we have a very short DRE—shorter than the security parameter \(\kappa \)—we can make the secret keys have length proportional to the length of the DRE rather than proportional to the security parameter.

5 One-Message Functional Encryption

We will now construct \(\varPi _{ OFE }= (\mathsf {OFE{.}Setup}, \mathsf {OFE{.}KeyGen}, \mathsf {OFE{.}Enc}, \mathsf {OFE{.}Dec})\): a function-hiding \((n,1,\mathrm {negl}(\kappa ))\)-secure functional encryption scheme for functions with an (information-theoretic) decomposable randomized encoding \(\mathsf {DRE}\). The construction is essentially the same as the (public key) variants given by [30, 44] except we consider information theoretic randomized encodings instead of computationally secure ones and instead of encrypting the labels under a public key encryption scheme, we take advantage of the private-key setting to use an encryption method that produces ciphertexts with size equal to the message if the message is smaller than the security parameter. Encrypting the labels of a short randomized encoding, this allows us to argue that keys for our scheme are small. To perform this encryption, we use a PRF evaluated on known indices to mask each short label of \(\mathsf {DRE}\).

Let \(n = \mathrm {poly}(\kappa )\) denote the number of users for the scheme. We assume for simplicity that \(\lg n\) is an integer. Our construction will rely on the following primitives:

  • A PRF family \(\{\mathsf {PRF}_{\mathsf {sk}} : \{0,1\}^{\lg n} \rightarrow \{0,1\}^{\lg n} \mid s \in \{0,1\}^\kappa \}\).

  • A DRE of \(f_{y}(x) = \mathbb {I}\{x\ge y\}\) where \(x, y \in \{0,1\}^{\log n}\) (Fig. 4).

Fig. 4.
figure 4

Our scheme \(\varPi _{ OFE }\).

5.1 Proof of Correctness

$$\begin{aligned} \mathsf {Dec}( c _i , \mathsf {sk}_j)&= \mathsf {\mathsf {DRE}{.}Decode}(\mathsf {PRF}_{\mathsf {sk}_{1, y_{1}}}(j) \oplus K^{(j)}_{1,y_1}, ..., \mathsf {PRF}_{\mathsf {sk}_{\lg n, y_{\lg n}}}(j) \oplus K^{(j)}_{\lg n,y_{\lg n}} )\\&=\mathsf {\mathsf {DRE}{.}Decode}( \tilde{F}_1(f_j, y_1 \oplus x_1,R_j), ...,\tilde{F}_{\lg n}(f_j, y_{\lg n} \oplus x_{\lg n},R_j) )\\&=\mathsf {\mathsf {DRE}{.}Decode}( \tilde{F}_1(f_j, i_1,R_j), ...,\tilde{F}_{\lg n}(f_j, i_{\lg n},R_j) )\\&= f_j(i)\ \end{aligned}$$

Where the last step uses the (perfect) correctness of the randomized encoding scheme. So:

5.2 Proof of Security

Lemma 2

\( \left| { \mathbb {P} \left[ E_{\kappa ,n,m}^{\mathrm {real}}(\varPi _{ OFE }, \mathcal {A}) = 1\right] } - { \mathbb {P} \left[ E_{\kappa ,n,m}^{\mathrm {ideal}}(\varPi _{ OFE }, \mathcal {A}, \mathsf {FE{.}Sim}) = 1\right] } \right| \le \varepsilon (\kappa )\)

Proof

Consider the hybrid scheme \(\varPi _{ OFE }^*\) defined in Fig. 5, which uses a truly random string instead of the output of a PRF for the encrypted labels corresponding to the off-bits of \(y = x \oplus i\). Note that this scheme is only useful in the nonadaptive security game, where i is known at time of \(\mathsf {Setup}\) (since it is needed to compute y). We can easily show that the scheme is indistinguishable from the original scheme in the nonadaptive security game.

Fig. 5.
figure 5

Hybrid scheme \(\varPi _{ OFE }^*\).

Lemma 3

$$\begin{aligned} \left| { \mathbb {P} \left[ E_{\kappa ,n,m}^{\mathrm {real}}(\varPi _{ OFE }, \mathcal {A}) = 1\right] } - { \mathbb {P} \left[ E_{\kappa ,n,m}^{\mathrm {real}}(\varPi _{ OFE }^*, \mathcal {A}) = 1\right] } \right| \le \lg n \cdot \mathsf {PRF}.\mathrm {Adv}(\kappa ) = \varepsilon (\kappa ) \end{aligned}$$

Proof

Follows easily by the security of the PRF (applied \(\lg n\) times in a hybrid for each function in the off-bits of y).

In Fig. 6 we define a simulator for the ideal setting that is indistinguishable from our hybrid scheme \(\varPi _{ OFE }^*\). The simulator uses the simulator for the decomposable randomized encoding scheme to generate the labels to be encrypted using only the knowledge of the output value of the functions on the input.

Fig. 6.
figure 6

The simulator \(\mathsf {OFE{.}Sim}\) for \(\varPi _{ OFE }\).

Lemma 4

\(\left| { \mathbb {P} \left[ E_{\kappa ,n,m}^{\mathrm {real}}(\varPi _{ OFE }^*, \mathcal {A}) = 1\right] } - { \mathbb {P} \left[ E_{\kappa ,n,m}^{\mathrm {ideal}}(\varPi _{ OFE }, \mathcal {A}, \mathsf {FE{.}Sim}) = 1\right] } \right| = 0\)

Proof

Follows easily by the information-theoretic security of the randomized encoding.

Adding the statements of Lemma 3 and Lemma 4 gives us the original statement of security.

So, \(\varPi _{ OFE }\) is a function-hiding-secure private-key functional encryption scheme \((n,1,\mathrm {negl}(\kappa ))\) with security based on the hardness of a PRF (which can be instantiated using a one-way function) and the existence of an information-theoretic randomized encoding for the family of comparison functions \(\{f_j : \{0,1\}^{\lg n} \rightarrow \{0,1\}\}\) of length L. Furthermore, note that the length of ciphertexts is: \(\lg n \cdot \kappa + \lg n = O(\kappa L)\) and the length of each key is L, satisfying the conditions of Theorem 4.

6 A Two-Message Functional Encryption Scheme for Comparison

We now use the one-message functional encryption scheme \(\varPi _{ OFE }\) described in Sect. 5 to construct a functional encryption scheme \(\varPi _{ FE }\) that is \((n,2,\frac{1}{300n^3})\)-secure for the family of comparison functions. For any \(y \in \{0,1\}^{\ell }\), let

$$\begin{aligned} f_{y}(x) = \mathbb {I}\{x\ge y\} \end{aligned}$$

where the comparison operation treats \(x, y\) as numbers in binary. We define the family of functions

$$\begin{aligned} \mathcal {F}_{\mathrm {comp}}:= \left\{ f_{y} : \{0,1\}^{\ell } \rightarrow \{0,1\}~\left| ~y \in \{0,1\}^{\ell } \right. \right\} \end{aligned}$$

In our application, we need \(x, y \in \left\{ 0,1,\dots ,n\right\} \), so we will set \(\ell = \lceil \log _2(n+1) \rceil = O(\log n)\). One important property of our construction is that the user key length will be fairly small as a function of \(\ell \), so that when \(\ell = O(\log n)\), the overall length of user keys will be \(n^{o(1)}\) (in fact, nearly polylogarithmic in n).

6.1 Construction

Our construction will be for a generic family of functions \(\mathcal {F}\), and we will only specialize the construction to \(\mathcal {F}_{\mathrm {comp}}\) when setting the parameters and bounding the length of the scheme. Before giving the formal construction, let’s gather some notation and ingredients. Note that we will introduce some additional parameters that are necessary to specify the scheme, but we will leave many of these parameters to be determined later.

  • Let n be a parameter bounding the number of user keys in the scheme, and let \(\mathbb {F}\) be a finite field whose size we will determine later.

  • Let \(\mathcal {F}= \left\{ f:\{0,1\}^{\ell } \rightarrow \{0,1\}\right\} \) be a family of functions. For each function \(f\in \mathcal {F}\) we define an associated polynomial \(\tilde{f}:\mathbb {F}^{\ell +1} \rightarrow \mathbb {F}\) as follows:

    1. 1.

      Let \(\hat{f} :\mathbb {F}^{\ell } \rightarrow \mathbb {F}\) be a polynomial computing \(f\)

    2. 2.

      Define \(\tilde{f}:\mathbb {F}^{\ell +1} \rightarrow \mathbb {F}\) to be \(\tilde{f}(x_{1},\dots ,x_{\ell }, z) = \hat{f}(x_1,\dots ,x_{\ell }) + z\)

    Let \(D\) and \(S\) be such that every for every \(f\in \mathcal {F}\), the associated polynomial \(\tilde{f}\) has degree at most \(D\) and can be computed by an arithmetic circuit of size at most \(S\). These degree and size parameters will depend on \(\mathcal {F}\).

  • Let \(\mathcal {P}_{D', S', \mathbb {F}}\) be the set of all univariate polynomials \(p :\mathbb {F}\rightarrow \mathbb {F}\) of degree at most \(D'\) and size at most \(S'\). Let \(\varPi _{ OFE }= (\mathsf {OFE{.}Setup}, \mathsf {OFE{.}KeyGen}, \mathsf {OFE{.}Enc}, \mathsf {OFE{.}Dec})\) be an \((n,1,\mathrm {negl}(\kappa ))\)-function-hiding-secure functional encryption scheme (i.e. secure for n keys and one message) for the family of polynomials \(\mathcal {P}_{D',S',\mathbb {F}}\).

We’re now ready to describe the construction of the two-message functional encryption scheme \(\varPi _{ FE }\). The scheme is specified in Fig. 7.

Fig. 7.
figure 7

A functional encryption scheme for 2 messages

Correctness of \(\varPi _{ FE }\). Before going on to prove security, we will verify that encryption and decryption are correct for our scheme. Fix any \(f_i \in \mathcal {F}\) and let \(\tilde{f}_i :\mathbb {F}^{\ell } \rightarrow \mathbb {F}\) be the associated polynomial, and fix any input \(x\in \mathbb {F}^{\ell }\). Let \(r_i :\mathbb {F}\rightarrow \mathbb {F}\) be the degree \(R D\) polynomial chosen by \(\mathsf {FE{.}KeyGen}\) on input \(f_i\) and let \(\tilde{f}_{i,r_{i}}(\cdot , t)\) be the function used to generate the key \(\mathsf {OFE{.}sk}_{i,t}\). Let \(q :\mathbb {F}\rightarrow \mathbb {F}^{\ell }\) be the degree R polynomial map chosen by \(\mathsf {FE{.}Enc}\) on input \(x\). Observe that, by correctness of \(\varPi _{ OFE }\), when we run \(\mathsf {FE{.}Dec}\) we will have

$$\begin{aligned} \tilde{p}(t) = \mathsf {OFE{.}Dec}(\mathsf {OFE{.}sk}_{i,t}, c_{t}) = \tilde{f}_{i,r_{i}}(q(t),t) = \tilde{f}_{i}(q(t)) + r_i(t). \end{aligned}$$

Now, consider the polynomial \(\tilde{f}_{i,q,r_i} :\mathbb {F}\rightarrow \mathbb {F}\) defined by

$$\begin{aligned} \tilde{f}_{i,q,r_i}(t) = \tilde{f}_{i}(q(t)) + r_{i}(t). \end{aligned}$$

Since \(\tilde{f}_i\) has degree at most \(D\), q has degree at most R, and \(r_{i}\) has degree at most \(R D\), the degree of \(\tilde{f}_{i,q,r_i}\) is at most \(RD\). Since \(|\mathcal {U}| = R D+ 1\), the polynomial \(\tilde{p}\) agrees with \(\tilde{f}_{i,q,r_{i}}\) at \(RD+ 1\) distinct points, and thus \(\tilde{p} \equiv \tilde{f}_{i,q,r_{i}}\). In particular, \(\tilde{p}(0) = \tilde{f}_{i,q,r_{i}}(0)\). Since we chose \(r_i\) and q such that \(r_i(0) = 0\) and \(q(0) = x\), we have \(\tilde{p}(0) = \tilde{f}_{i}(q(0)) + r_i(0) = \tilde{f}_{i}(x)\). This completes the proof of correctness.

6.2 Security for Two Messages

Theorem 5

For every polynomial \(n = n(\kappa )\), \(\varPi _{ FE }\) is \((n,2,\delta )\)-secure for \(\delta = T \cdot \mathrm {negl}(\kappa ) + 2^{-\varOmega (R)}\).

First we describe at a high level how to simulate. To do so, it will be useful to first introduce some terminology. Recall that \(\mathsf {FE{.}Setup}\) instantiates T independent copies of the one-message scheme T. We refer to each instantiation as a component. Thus, when we talk about generating a secret key for a function \(f_i\) we will talk about generating each of the T components of that key and similarly when we talk about generating a ciphertext for an input \(x_b\) we will talk about generating each of the U components of that ciphertext. Thus the simulator has to generate a total of nT components of keys and 2U components of ciphertexts. The simulator will consider several types of components:

  • Components \(t \in \mathcal {U}_{1}\, \cap \, \mathcal {U}_{2}\) where \(\mathcal {U}_{1}, \mathcal {U}_2\) are the random sets of components chosen by the encryption scheme for the two inputs, respectively. The adversary obtains two ciphertexts for these components, so we cannot use the simulator for the one-message scheme. Thus for these components we simply choose uniformly random values for all the keys and ciphertexts and use the real one-message scheme.

  • Components \(t \in \mathcal {U}_{1}\! \bigtriangleup \mathcal {U}_{2}\) (where \(\bigtriangleup \) is the symmetric difference). For these we want to use the simulator for the one-message scheme to generate both the keys for each function and the ciphertexts for these components (recall that the one-message scheme is function-hiding). To do so, we need to feed the simulator with the evaluation of each of the functions on the chosen input. We show how to generate these outputs by leveraging the random high-degree polynomials \(r_i\) included in the keys. These values are then fed into the simulator to produce the appropriate key and ciphertext components.

  • Components \(t \not \in \mathcal {U}_{1} \cup \mathcal {U}_{2}\). For these components the real scheme would not generate a ciphertext so the distribution can be simulated by a simulator that takes no inputs.

With this outline in the place, it is not too difficult to construct and analyze the simulator.

Fig. 8.
figure 8

The simulator \(\mathsf {FE{.}Sim}\) for \(\varPi _{ FE }\).

Proof

(Proof of Theorem 5). We prove security via the simulator described in Fig. 8.

First we make a simple claim showing that there is only a small probability that the simulator has to halt and output \(\bot \) because \(\mathcal {I}\) is too large.

Claim

\({ \mathbb {P} \left[ \mathsf {FE{.}Sim}= \bot \right] } = 2^{-\varOmega (R)}\).

Proof

(Proof Sketch for Claim in Sect. 6.2). Recall that \(\mathcal {I}\) is defined to be \(\mathcal {U}_{1} \cap \mathcal {U}_{2}\). Since \(\mathcal {U}_1, \mathcal {U}_2\) are random subsets of [T], each of size U, and we set \(T = U^2\), we have \({ \mathbb {E} \left[ |\mathcal {I}|\right] } = 1\). Moreover, the intersection of the two sets has a hypergeometric distribution, and by a standard tail bound for the hypergeometric distribution we have \({ \mathbb {P} \left[ \mathsf {FE{.}Sim}= \bot \right] } = { \mathbb {P} \left[ |\mathcal {I}| > R\right] } \le 2^{-\varOmega (R)}\).

In light of the above claim, we will assume for the remainder of the analysis that the simulator does not output \(\bot \), and thus \(|\mathcal {I}| \le R\), and this will only cost add \(2^{-\varOmega (R)}\) to the simulation error. In what follows, we will simplify notation by referring only to components corresponding to keys and one of the ciphertexts, and will drop the superscript b. All of our arguments also applies to the second ciphertext, since this ciphertext is generated in a completely symmetric way.

Components Used in Both Ciphertexts. First, we claim that the simulator produces the correct distribution of the keys and ciphertexts for the components \(t \in \mathcal {I}\). Note that the simulator chooses the keys in exactly the same way as the real scheme would: it generates keys for the functions \(\tilde{f}_{i,r_{i}}(\cdot , t)\) where \(r_{i}\) is a random degree \(R D\) polynomial with the constant coefficient 0. The ciphertexts in the real scheme would contain the messages \(\left\{ q(t)\right\} _{t \in \mathcal {U}}\) where q is a random degree R polynomial with constant coefficient equal to the (unknown) input \(x\). Since \(|\mathcal {I}| \le R\), this is a uniformly random set of values. Thus, the distribution of \(\left\{ \alpha _{t}\right\} _{t \in \mathcal {U}}\) is identical to \(\left\{ q(t)\right\} _{t \in \mathcal {U}}\), and therefore the simulated ciphertext components and the real ciphertext components have the same distribution.

Components Used in Exactly One Ciphertext. Next we claim that the simulated keys and ciphertexts for the components \(t \in \mathcal {U}\setminus \mathcal {I}\) are computationally indistinguishable from those of the real scheme. Since in these components we only need to generate a single ciphertext, we can rely on the simulator \(\mathsf {OFE{.}Sim}\) for the one-message scheme. \(\mathsf {OFE{.}Sim}\) takes evaluations of n functions each at a single input and simulates the keys for those n functions and the ciphertext for that single input. In order to apply the indistinguishability guarantee for \(\mathsf {OFE{.}Sim}\), we need to argue that the evaluations that \(\mathsf {FE{.}Sim}\) feeds to \(\mathsf {OFE{.}Sim}\) are jointly identically distributed to the real scheme.

Recall that in the real scheme, each key corresponds to a function \(\tilde{f}_{i,r_{i}}(\cdot , t)\) and this function gets evaluated on points q(t). Thus for each function i, and each ciphertext component t, the evaluation is \(\tilde{f}_{i,q,r_i}(t) = \tilde{f}_{i}(q(t)) + r_i(t)\). The polynomials \(q,r_1,\dots ,r_n\) are chosen so that for every i, \(\tilde{f}_{i,q,r_i}(0) = \tilde{f}_{i}(x)\) where \(x\) is the (unknown) input. We need to argue that the set of evaluations \(\{\tilde{y}_{i,t}\}\) generated by the simulator have the same distribution as \(\{\tilde{f}_{i,q,r_i}(t)\}\). Observe that, since \(r_i\) is a random polynomial of degree \(RD\) with constant coefficient 0, its evaluation on any set of \(RD\) points is jointly uniformly random. Therefore, for every q chosen independently of r, the evaluation of \(\tilde{f}_{i,q,r_i}\) on any set of \(RD\) points is also jointly uniformly random. On the other hand, the evaluation of \(\tilde{f}_{i,q,r_i}\) on any set of \(RD+ 1\) points determines the whole function and thus determines \(\tilde{f}_{i,q,r_i}(0)\), therefore conditioned on evaluations at any set of \(RD\) points, and the desired value of \(\tilde{f}_{i,q,r_i}(0)\), the evaluation at any other point is uniquely determined.

Now, in the simulator, for every i, we choose \(RD\) evaluations \(\tilde{y}_{i,t}\) uniformly randomly—for the points \(t \in \mathcal {I}\) they are uniformly random because the polynomials \(r_i\) and the values \(\alpha _{i,t}\) were chosen randomly, and then for all but one point in \(\mathcal {U}\setminus \mathcal {I}\) we explicitly chose them to be uniformly random. For the remaining point, we chose \(\tilde{y}_{i,t}\) to be the unique point such that we obtain the correct evaluation of \(\tilde{f}_{i,q,r_i}(0)\), which is the value \(\tilde{y}_{i}\) that was given to the simulator. Thus, we have argued that for any individual i, the distribution of the points \(\tilde{y}_{i,t}\) that we give to the simulator is identical to that of the real scheme. The fact that this holds jointly over all i follows immediately by independence of the polynomials \(r_1,\dots ,r_n\).

Components Used in Neither Ciphertext. Since the underlying one-message scheme satisfies function-hiding, it must be the case that the distribution of n keys and no messages is computationally indistinguishable from a fixed distribution. That is, it can be simulated given no evaluations. Thus we can simply generate the keys for these unused components in a completely oblivious way.

Since we have argued that all components are simulated correctly, we can complete the proof by taking a hybrid argument over the simulation error for each of the T components, and a union bound over the failure probability corresponding to the case where \(|\mathcal {I}| > R\). Thus we argue that \(\mathsf {FE{.}Sim}\) and the real scheme are computationally indistinguishable with the claimed parameters.

6.3 Bounding the Scheme Length for Comparison Functions

In the application to differential privacy, we need to instantiate the scheme for the family of comparison functions of the form \(f_{y}(x) = \mathbb {I}\{x\ge y\}\) where \(x, y \in \{0,1\}^{\log n}\), and we need to set the parameters to ensure \((n,2,\frac{1}{300n^3})\)-security where \(n = n(\kappa )\) is an arbitrary polynomial.

Theorem 6

For every polynomial \(n = n(\kappa )\) there is a \((n,2,\frac{1}{300n^3})\)-secure functional encryption scheme for the family of comparison functions on \(O(\log n)\) bits with keys are in \(K_{\kappa }\) and ciphertexts in \(C_{\kappa }\) where

$$\begin{aligned} |K_{\kappa }| = 2^{2^{\mathrm {poly}(\log \log n)}} = 2^{n^{o(1)}} \quad \text {and} \quad |C_{\kappa }| = 2^{\kappa }. \end{aligned}$$

Theorem 1 follows by combining Theorem 6 with Theorem 2. Note that Theorem 6 constructs a different scheme for every polynomial \(n = n(\kappa )\). However, we can obtain a single scheme that is secure for every polynomial \(n(\kappa )\) by instantiating this construction for some \(n'(\kappa ) = \kappa ^{\omega (1)}\).

Proof

(Proof of Theorem 6). By Theorem 5, if the underlying one-message scheme \(\varPi _{ OFE }\) is \((n,1,\mathrm {negl}(\kappa ))\)-function-hiding secure, then the final scheme \(\varPi _{ FE }\) will be \((n,2,\delta )\)-secure for \(\delta = T\cdot \mathrm {negl}(\kappa ) + 2^{-\varOmega (R)}\). If we choose an appropriate \(R = \varTheta (\log n)\) then we will have \(\delta = T \cdot \mathrm {negl}(\kappa ) + \frac{1}{600n^3}\). As we will see, T will be a polynomial in n, so for sufficiently large values of \(\kappa \), we will have \(\delta \le \frac{1}{300n^3}\). To complete the proof, we bound the length of the keys and ciphertexts:

The functions constructed in \(\mathsf {FE{.}KeyGen}\) have small DREs. For the family of comparison functions on \(\log n\) bits, there is a universal Boolean formula \(u(x,y) :\{0,1\}^{\log n} \times \{0,1\}^{\log n} \rightarrow \{0,1\}\) of size \(S = O(\log n)\) and depth \(d = O(\log \log n)\) that computes \(f_{y}(x)\). Thus, for any field \(\mathbb {F}\), the polynomial \(\tilde{u}(x,y) :\mathbb {F}^{\log n} \times \mathbb {F}^{\log n} \rightarrow \mathbb {F}\) is computable by an arithmetic circuit of size \(S = O(\log n)\) and depth \(d = O(\log \log n)\), and this polynomial computes \(\tilde{f}_{y}(x)\). For any value \(r \in \mathbb {F}\), the polynomial \(\tilde{u}_{r}(x,y) = \tilde{u}(x,y) + r\) is also computable by an arithmetic circuit of size \(S + 1 = O(\log n)\) with degree d. Note that this polynomial is a universal evaluation for the polynomials \(\tilde{f}_{y,r}(\cdot , t) = \tilde{f}_{y}(\cdot ) + r(t)\) created in \(\mathsf {FE{.}KeyGen}\).

To obtain a DRE, we can write \(\tilde{u}_{r}(x,y)\) as a Boolean formula \(u_{r,\mathbb {F}}(x,y) :\{0,1\}^{(\log n)(\log |\mathbb {F}|)} \times \{0,1\}^{(\log n)(\log |\mathbb {F}|)} \rightarrow \{0,1\}^{\log |\mathbb {F}|}\) with depth \(d' = d\cdot \mathrm {depth}(\mathbb {F})\) and size \(S' = S \cdot \mathrm {size}(\mathbb {F})\) where \(\mathrm {depth}(\mathbb {F})\) and \(\mathrm {size}(\mathbb {F})\) are the depth and size of Boolean formulae computing operations in the field \(\mathbb {F}\), respectively. Later we will argue that it suffices to choose a field of size \(\mathrm {poly}(\log n)\), and thus \(d_{\mathbb {F}}, S_{\mathbb {F}} = \mathrm {poly}(\log \log n)\). Therefore these functions can be computed by formulae of depth \(d' = \mathrm {poly}(\log \log n)\) and size \(S' = \mathrm {poly}(\log n)\). Finally, by Theorem 3, the universal evaluator for this family has DREs of length \(O(4^{d'}) = \exp ({\mathrm {poly}(\log \log n)})\).

The secret keys and ciphertexts for each component are small. \(\varPi _{ FE }\) generates key and ciphertext components for up to T independent instantiations of \(\varPi _{ OFE }\). Each function for \(\varPi _{ OFE }\) corresponds to a formula of the form \(u_{r,\mathbb {F}}\) defined above. By Theorem 4, we can instantiate \(\varPi _{ OFE }\) so that each key component has length \(\exp ({\mathrm {poly}(\log \log n)})\) and each ciphertext component has length \(\kappa \cdot \exp (\mathrm {poly}(\log \log n)) = \mathrm {poly}(\kappa )\), where the last inequality is because \(n = \mathrm {poly}(\kappa )\).

The number of components T and the size of the field \(\mathbb {F}\) is small. In \(\varPi _{ FE }\) we take \(T = U^2 = (RD+1)^2\) where \(D \le 2^{d}\) is the degree of the polynomials computing the comparison function over \(\mathbb {F}\). As we argued above, we can take \(R = O(\log n)\) and \(D= \mathrm {poly}(\log n)\). Therefore we have \(T = \mathrm {poly}(\log n)\). We need to ensure that \(|\mathbb {F}| \ge T+1\), since the security analysis relies on the fact that each component \(t \in [T]\) corresponds to a different non-zero element of \(\mathbb {F}\). Therefore, it suffices to have \(|\mathbb {F}| = \mathrm {poly}(\log n)\). In particular, this justifies the calculations above involving the complexity of field operations.

Putting it together. By the above, each component of the secret keys has length \(\exp (\mathrm {poly}(\log \log n))\) and there are \(\mathrm {poly}(\log n)\) components, so the overall length of the keys for \(\varPi _{ FE }\) is \(\exp (\mathrm {poly}(\log \log n))\). Each component of the ciphertexts has length \(\mathrm {poly}(\kappa )\) and there are \(\mathrm {poly}(\log n) = \mathrm {poly}(\log \kappa ))\) components, so the overall length of the ciphertexts for \(\varPi _{ FE }\) is \(\mathrm {poly}(\kappa )\). The theorem statement now follows by rescaling \(\kappa \) and converting the bound on the length of the keys and ciphertexts to a bound on their number.

7 Two-Message Functional Encryption \(\Rightarrow \) Index Hiding

As discussed in Subsect. 3.2, Lemma 1 tells us that if we can show that any adversary’s advantage in the \(\mathrm {TwoIndexHiding}\) game is small, then the game’s traitor-tracing scheme satisfies weak index-hiding security and gives us the lower bound of Theorem 2. First, note that one can use a private key functional encryption scheme for comparison functions directly as a traitor-tracing scheme, since they have the same functionality. We will now show that any private key functional encryption scheme that is \((n,2,\frac{1}{300n^3})\)-secure is a secure traitor-tracing scheme in the \(\mathrm {TwoIndexHiding}\) game.

In Fig. 9, we describe a variant of the \(\mathrm {TwoIndexHiding}\) game from Fig. 2 that uses the simulator \(\mathsf {FE{.}Sim}\) for the functional encryption scheme \(\varPi _{ FE }= (\mathsf {FE{.}Setup}, \mathsf {FE{.}KeyGen}, \mathsf {FE{.}Enc}, \mathsf {FE{.}Dec})\) for comparison functions \(f_{y}(x) = \mathbb {I}\{x\ge y\}\) where \(x, y \in \{0,1\}^{\log n}\) that is \((n,2,\frac{1}{300n^3})\)-secure. Note that the challenger can give the simulator inputs that are independent of the game’s \(b_0, b_1\) since for all indices \(j \ne i^*\), the output values of the comparison function for j on both inputs \(i^* - b_0, i^* - b_1\) are always identical: \(\mathbb {I}\{j > i^*\}\) (for all \(b_0, b_1 \in \{0,1\}\)).

Fig. 9.
figure 9

\(\mathrm {SimTwoIndexHiding}[i^*]\)

Defining:

$$\begin{aligned} \mathrm {SimTwoAdv}[i^*] ={} {\underset{\mathrm {SimTwoIndexHiding}[i^*]}{\mathbb {P}} \left[ b' = b_0 \oplus b_1\right] } - \frac{1}{2} \end{aligned}$$

We can then prove the following lemmas:

Lemma 5

For all p.p.t. adversaries, \(\mathrm {SimTwoAdv}[i^*] = 0\).

Proof

In \(\mathrm {SimTwoIndexHiding}[i^*]\), \(b_0, b_1\) are chosen uniformly at random and independent of the adversary’s view. Therefore, the probability that the adversary outputs \(b' = b_0 \oplus b_1\) is exactly \(\frac{1}{2}\), and so

$$\begin{aligned} \mathrm {SimTwoAdv}[i^*] ={} {\underset{\mathrm {SimTwoIndexHiding}[i^*]}{\mathbb {P}} \left[ b' = b_0 \oplus b_1\right] } - \frac{1}{2} = 0. \end{aligned}$$

Lemma 6

For all p.p.t. adversaries, \(|\mathrm {TwoAdv}[i^*] - \mathrm {SimTwoAdv}[i^*]| \le \frac{1}{300n^3}\).

Proof

This follows easily from the simulation security of the 2-message FE scheme.

We can now show that any adversary’s advantage in the \(\mathrm {TwoIndexHiding}\) game is small:

Lemma 7

Given a Two-Message Functional Encryption scheme for comparison functions \(f_{y}(x) = \mathbb {I}\{x\ge y\}\) where \(x, y \in \{0,1\}^{\log n}\) that is \((n,2,\frac{1}{300n^3})\)-secure,

$$\begin{aligned} \varPi _{ FE }= (\mathsf {FE{.}Setup}, \mathsf {FE{.}KeyGen}, \mathsf {FE{.}Enc}, \mathsf {FE{.}Dec}) \end{aligned}$$

then for all \(i^*\),

$$\begin{aligned} \mathrm {TwoAdv}[i^*] \le \frac{1}{300n^3} \end{aligned}$$

Proof

Adding the statements of Lemma 5 and Lemma 6 gives us the statement of the lemma: \( \mathrm {TwoAdv}[i^*] \le \frac{1}{300n^3} \) This completes the proof.

Combining Lemma 7 with Lemma 1, the \((n,2,\frac{1}{300n^3})\)-secure Two-Message Functional Encryption scheme from Sect. 6 is therefore a \((n, \{K_\kappa , C_\kappa \})\)-traitor tracing scheme with weak index-hiding security. From Theorem 6, we have that

$$\begin{aligned} |K_{\kappa }| = 2^{2^{\mathrm {poly}(\log \log n)}} = 2^{n^{o(1)}} \qquad \text {and} \qquad |C_{\kappa }| = 2^{\kappa }. \end{aligned}$$

which when combined with Theorem 2 gives us our main Theorem 1.