1 Introduction

Searchable Encryption (SE), first proposed by Song et al. [30] and then formalized by Curtmola et al. [12], enables a data owner to outsource a private dataset \(\mathcal {D}\) to a server, so that the latter can answer keyword queries without learning too much information about the underlying dataset and the posed queries. An alternative to expensive primitives such as oblivious RAM and fully homomorphic encryption, SE schemes are practical at the expense of formally-specified leakage. In typical SE schemes, the data owner prepares a private index which is sent to the server. To perform a query on keyword w, the data owner engages in a protocol with the server such that by the end of the protocol the data owner retrieves the list of document identifiers \(\mathcal {D}(w)\) of documents containing w. During this process, the server should learn nothing except for the (number of) retrieved document identifiers—referred to as (size of) access pattern—and whether the keyword search query w was repeated in the past or not—referred to as search pattern.

To retrieve the document identifiers \(\mathcal {D}(w)\) (also referred to as keyword list in the rest of the paper), most existing SE schemes require the server access approximately \(|\mathcal {D}(w)|\) randomly-assigned memory locations [10, 12, 14, 23, 24, 30, 31]. While this random allocation is essential for security, it creates a big bottleneck when accessing large indexes stored on disk.Footnote 1 Therefore the aforementioned schemes cannot scale for big data that do not fit in memory due to poor locality—the number of non-contiguous memory locations that must be read to retrieve the result.

Locality and Read Efficiency Trade-offs. One trivial way to design an SE scheme that has optimal locality \(L=1\) is to have the client download the whole encrypted index for every query w. Unfortunately, such an approach requires O(N) bandwidth, where N is the total number of keyword-document pairs. Cash et al. [10] were the first to observe this trade-off: To improve the locality of SE, one should expect to read additional entries per query. The ratio of the total number of entries read over the size of the query result was defined as read efficiency. This trade-off was subsequently formalized by Cash and Tessaro [11] who showed it is impossibleFootnote 2 to construct an SE scheme with linear space, optimal locality and optimal read efficiency.

In response to this impossibility result, several positive results with various trade-offs have appeared. Cash and Tessaro [11] presented a scheme with \(\varTheta (N\log N)\) space, O(1) read efficiency and \(O(\log N)\) locality, which was later improved to O(1) by Asharov et al. [6]. Demertzis and Papamanthou [16] presented a scheme with bounded locality \(O(N^{\epsilon })\), O(1) read efficiency and linear space (for constant \(\epsilon <1\)). More recently, Asharov et al. [5] studied the locality in Oblivious RAMs, proposing a construction that, for an arbitrary sequence of accesses (and therefore for SE as well), achieves O(N) space, \(O(\log ^2N\log ^2\log N)\) read efficiency, and \(O(\log N\log ^2\log N)\) locality. While asymptotically worse than [6], this work has better security as it leaks no access pattern. Finally, significant speedups due to locality in SE implementations have been observed by Miers and Mohassel [25] and Demertzis et al. [15, 16].

Constant Locality with Linear Space. Practical reasons described above have motivated the study of even more asymptotically-efficient SE schemes, and in particular those with constant locality and linear space. Asharov et al. [6] presented two such SE schemes based on a two-dimensional generalization of the “balls and bins” problem. In particular, the first scheme (A1) uses two-choice allocation, has very low read efficiency \(\varTheta (\log \log N\log ^2 \log \log N)\) but is based on the assumption that all lists \(\mathcal {D}(w)\) have size \(\le N^{1-1/\log \log N}\).Footnote 3 Recently, Asharov et al. [7] provided a version of A1 with improved read efficiency and a better bound \(N/\log ^3N\) for the maximum \(\mathcal {D}(w)\) size. The second construction of [6] (A2) has \(\tilde{\varTheta }(\log N)\) read efficiency, uses one-choice allocation and makes no assumptions about the dataset. To our knowledge, A2 is the best SE scheme with O(1) locality and O(N) space known to-date for general datasets.

Our Contribution. Motivated by the above positive results and the impossibility result of [11], we ask whether it is possible to build an SE scheme for general datasets with: (i) linear space, (ii) constant locality, and (iii) sublogarithmic read efficiency. We answer this question in the affirmative by designing the first such SE scheme, strictly improving upon the best known scheme A2 [6]. For the rest of the paper we set \(\gamma =2/3+\delta \) for \(\delta >0\) arbitrarily small. We show that the read efficiency of our scheme is \(O(\log ^{\gamma } N)\) as opposed to that of A2 which is \(\varTheta (\log N\log \log N)\). Parameter \(\delta \) above affects the constants in the asymptotic notation which grow with \(O(1/\delta )\).

1.1 Summary of Our Techniques

Our techniques (like previous works on low-locality SE) use the notion of an allocation algorithm, whose goal is to store the dataset’s keyword lists in memory such that each keyword list \(\mathcal {D}(w)\) can be efficiently retrieved by accessing memory locations independent of the distribution the SE dataset—this is needed for security reasons. Common techniques to achieve this, store keyword lists using a balls-and-bins procedure [6].

Starting Point. We first observe that keyword lists of size less than \(N^{1-1/\log ^{1-\gamma }N}\), for some \(\gamma <1\), can be allocated using (as a black box) the parameterized version of scheme A1 of Asharov et al. [6]. In particular, we show in Theorem 6 that for \(\gamma = 2/3+ \delta \) scheme A1 yields \(\varTheta (\log ^{\gamma } N)\) read efficiency, as desired. Therefore we only need to focus on allocating the dataset’s keyword lists that have size \(>N^{1-1/\log ^{1-\gamma } N}\).

Our Main Technique: Different Allocation Algorithms for Different Size Ranges. Let \(\gamma = 2/3+ \delta \) as defined above. We develop three allocation algorithms for the remaining ranges: Lists with size in \((N^{1-1/\log ^{1-\gamma } N},N/\log ^2 N]\), also called medium, are allocated using an offline two-choice allocation procedure [28] and multiple stashes to handle overflows. Lists with size in \((N/\log ^2 N,N/\log ^{\gamma } N]\), also called large, are first split into further subranges based on their size and then each subrange is allocated into a separate array using the same algorithm. Finally, for lists with size in \((N/\log ^{\gamma } N,N]\), also called huge, there is no special allocation algorithm: We just read the whole dataset. We now provide a summary of our allocation algorithms for medium and large lists.

1.2 Medium Keyword Lists Allocation

Our allocation algorithm for medium keyword lists is using an offline two-choice allocation (OTA),Footnote 4 where there are m balls and n bins and for each ball two possible bins are chosen independently and uniformly at random. After all choices have been made, one can run a maximum flow algorithm to find the final assignment of balls to bins such that the maximum load is minimized. This strategy yields an almost perfectly balanced allocation (where max-load \(\le \lceil m/n\rceil +1\)) with probability at least \(1-O(1/n)\) [28].

Central Idea: One OTA Per Size and Then Merge. We use one OTA separately for every size s that falls in the range \((N^{1-1/\log ^{1-\gamma } N},N/\log ^2 N]\) as follows: Let \(\mathbf A _s\) be an array of M buckets \(\mathbf A _s[1],\mathbf A _s[2],\ldots ,\mathbf A _s[M]\), for some appropriately chosen M. One can visualize a bucket \(\mathbf A _s[i]\) as a vertical structure of unbounded capacity. Let \(k_s\) be the number of keyword lists of size s and let \(b_s=M/s\) be the number of superbuckets in \(\mathbf A _s\), where a superbucket is a collection of s consecutive buckets in \(\mathbf A _s\). We perform an OTA of \(k_s\) keyword lists to the \(b_s\) superbuckets. From [28], there will be at most \(\lceil k_s/b_s\rceil +1\) lists of size s in each superbucket with probability at least \(1-O(1/b_s)\), meaning the load of each bucket due to lists of size s will be at most \(\lceil k_s/b_s\rceil +1\) with the same probability, given there are s buckets in a superbucket.

Our final allocation merges arrays \(\mathbf A _s\) for all sizes s corresponding to medium keyword lists into a new array \(\mathbf A \) of M buckets—see Fig. 5. To bound the final load of each bucket \(\mathbf A [i]\) in the merged array \(\mathbf A \) one can compute \(\sum _s (\lceil k_s/b_s\rceil +1)\) which is \(O(N/M+\log ^{\gamma } N)\)—see Lemma 4. If we set \(M=N/\log ^{\gamma } N\), our allocation occupies linear space and each bucket \(\mathbf A [i]\) has load \(O(\log ^{\gamma } N)\)—thus to read one list, one reads the two superbuckets initially picked by the OTA yielding read efficiency \(O(\log ^{\gamma } N)\).

Handling Bucket Overflows with Additional Stashes. Our analysis above assumes the maximum load of each bucket is at most \(\lceil k_s/b_s\rceil +1\). However, there is a noticeable probability \(O(1/b_s)\) of overflowing beyond this bound—this will cause our allocation to fail, leaking information about the dataset. To deal with this problem, for each size s, we place the lists of size s that overflow in a stash \(\mathbf B _s\) (at the server) that can store up to \(O(\log ^2 N)\) such overflowing lists. In particular, we prove that when the OTA described previously is performed for medium lists, at most \(O(\log ^2 N)\) lists of size s overflow with non-negligible probability and thus our stashes \(\mathbf B _s\) suffice, see Lemma 5. We also stress that we need the condition \(s\le N/\log ^2 N\) to keep the space of the stashes linear—see Theorem 7, justifying the pick of \(N/\log ^2 N\) as endpoint of the range where we apply OTA. Finally, the existence of stashes \(\mathbf B _s\) differentiates our allocation from those of [6], allowing us to avoid their impossibility result (see discussion in Sect. 7).

New Probability Bounds for OTA. Our proof for the \(O(\log ^2 N)\) stash size extends the analysis of [28] non-trivially—we prove two new results in Sect. 3: First, in Theorem 1 we show that in an OTA, the probability \(>\tau \) bins overflow decreases with \((1/\tau )^\tau \). For this proof we show the 0/1 random variables indicating bin overflow are negatively associated [17]. Second, in Theorem 2 we show the probability an OTA of m balls to n bins yields a maximum load of \(>\lceil m/n \rceil +\tau \) is \(\le O(1/n)^\tau +exp(-n)\).

Accessing Stashes Obliviously. Because keyword lists of size s might now live in the stash \(\mathbf B _s\), retrieving a keyword list \(\mathcal {D}(w)\) is a two-step process: First, access the superbuckets that were initially assigned by the OTA and then access a position x in the stash. In case \(\mathcal {D}(w)\) is not in the stash (because it was not an overflowing list), x should be still assigned a stash position chosen from the unoccupied ones, if such a position exists. If not, there will be a collision, in which case the adversary can deduce information about the dataset distribution, e.g., that the dataset contains at least \(\log ^2 N\) lists of size \(|\mathcal {D}(w)|\). To avoid such leakage, the stash must be accessed obliviously.

New ORAM with \(o(\sqrt{n})\) Bandwidth, O(1) Locality & Zero Failure Probability. Since the stash has only \(\log ^2 N\) entries of size \(|\mathcal {D}(w)|\) each, one can access it obliviously by reading it all. But this increases read efficiency to \(\log ^2 N\), which is no longer sublogarithmic. Thus, we need an ORAM with (i) O(1) locality, (ii) \(o(\sqrt{n})\) bandwidth and (iii) zero failure probability since it will be applied on only \(\log ^2 N\) indices. In Sect. 4, we devise a new ORAM satisfying the above (with \(O(n^{1/3}\log ^2 n)\) bandwidth) based on one recursive application of Goldreich’s and Ostrovsky’s square-root ORAM [18]. This protocol can be of independent interest. To finally ensure our new ORAM has O(1) locality, we use I/O-efficient oblivious sorting by Goodrich and Mitzenmacher [20].

1.3 Large Keyword Lists Allocation

We develop an Algorithm \(\mathsf {AllocateLarge}(\mathsf {min},\mathsf {max})\) that can allocate lists with sizes in a general range \((\textsf {min},\textsf {max}]\). We will be applying this algorithm for lists in the range \((N/\log ^2 N,N/\log ^{\gamma } N]\). The algorithm works as follows. Let \(\mathbf A \) be an array that has 2N entries, organized in \(N/\textsf {max}\) buckets of capacity \(2\textsf {max}\) each. To store a list of size \(s\in (\textsf {min},\textsf {max}]\), a bucket with available size at least s is chosen. To retrieve a list, the entire bucket where the list is stored is accessed using our ORAM construction from Sect. 4—note that ORAM is relatively cheap for this range, since \(N/\textsf {max}\) is small.

In this way we always pay the cost of accessing lists of size \(\textsf {max}\), even for smaller list sizes \(s>\textsf {min}\). The read efficiency of this approach is clearly at least \(\textsf {max}/\textsf {min}\), which for the specified range above is \(\log ^2N/ \log ^{\gamma }N = \omega (\log N)\) for \(\gamma <1\). Still, this is not enough for our target, which is sublogarithmic read efficiency. Therefore, we need to further split this range into multiple subranges and apply the algorithm for each subrange independently. The number of subranges depends on the target read efficiency, i.e., it depends on \(\gamma \) (but not on N). For example, for \(\gamma <1\) it suffices to have 3 subranges, whereas setting \(\gamma = 0.75\) would require splitting \((N/\log ^2 N,N/\log ^{\gamma } N]\) into a fixed number of 11 subranges. In general, as \(\delta >0\) decreases and \(\gamma =2/3 + \delta \) gets closer to 2/3 the number of subranges will increase. We note that using an ORAM of better worst-case bandwidth (e.g., \(O(\log ^{1/5}\log ^2 N)\) instead of \(O(\log ^{1/3}\log ^2 N)\)) would reduce the necessary number of subranges (see discussion in Sect. 7).

2 Notation and Definitions

We use the notation \(\langle C',S'\rangle \leftrightarrow \varPi \langle C,S\rangle \) to indicate that protocol \(\varPi \) is executed between a client with input C and a server with input S. After the execution of the protocol the client receives \(C'\) and the server receives \(S'\). Server operations are in light gray background. All other operations are performed by the client. The client typically interacts with the server via an operation, with which the client encrypts data locally with a CPA-secure encryption scheme and writes the encrypted data data remotely to server and via a operation, with which the client reads encrypted data data from server and decrypts them locally.

In the following, \(\mathcal {D}\) will denote the searchable encryption dataset (SE dataset) which is a set of keywords lists \(\mathcal {D}(w_i)\). Each keyword list \(\mathcal {D}(w_i)\) is a set of keyword-document pairs \((w_i,id)\), called elements, where id is the document identifier containing keyword \(w_i\). We denote with N the size of our dataset, i.e., \(N=\sum _{w\in \mathbf W }|\mathcal {D}(w)|\), where \(\mathbf W \) is the set of unique keywords of our dataset \(\mathcal {D}\). Without loss of generality, we will assume that all keyword lists \(\mathcal {D}(w_i)\) have size \(|\mathcal {D}(w_i)|\) that is a power of two. This can always be enforced by padding with dummy elements, and will only increase the space at most by a factor of 2. Finally, a function \(f(\kappa )\) is negligible, denoted \(\mathsf {neg}(\kappa )\), if for sufficiently large \(\kappa \) it is less than \(1/p(\kappa )\), for all polynomials \(p(\kappa )\).

2.1 Searchable Encryption

Our new SE scheme uses a modification of the square-root ORAM protocol as a black box, which is a two-round protocol. Therefore to model our SE scheme we use the protocol-based definition \(\left( \textsc {Setup}, \textsc {Search}\right) \) as proposed by Stefanov et al. [31].

  • \(\langle st, \mathcal {I}\rangle \leftrightarrow \textsc {Setup}\langle (1^\kappa , \mathcal {D}),1^\kappa \rangle \): \(\textsc {Setup}\) takes as input security parameter \(\kappa \) and SE dataset \(\mathcal {D}\) and outputs secret state st (for client), and encrypted index \(\mathcal {I}\) (for server).

  • \(\langle (\mathcal {D}(w), st'), \mathcal {I}' \rangle \leftrightarrow \textsc {Search}\langle (st,w), \mathcal {I}\rangle \): \(\textsc {Search}\) is a protocol between client and server, where the client’s input is secret state st and keyword w. Server’s input is encrypted index \(\mathcal {I}\). Client’s output is set of document identifiers \(\mathcal {D}(w)\) matching w and updated secret state \(st'\) and server’s output is updated encrypted index \(\mathcal {I}'\).

Just like in previous works [6], the goal of our SE protocols is for the client to retrieve the document identifiers (i.e., the list \(\mathcal {D}(w)\)) for a specific keyword w. The document themselves can be downloaded from the server in a second round, by just providing \(\mathcal {D}(w)\). This is orthogonal to our protocols and we do not consider/model it here explicitly. We also note that we focus only on static SE. However, by using generic techniques, e.g., [14], we can extend our schemes to the dynamic setting. The correctness definition of SE is given in the extended version [13]. We now provide the security definition.

Fig. 1.
figure 1

Real and ideal experiments for the SE scheme.

Definition 1

(Security of SE). An SE scheme \((\textsc {Setup},\textsc {Search})\) is secure in the semi-honest model if for any PPT adversary \(\mathsf {Adv}\), there exists a stateful PPT simulator \((\textsc {SimSetup},\textsc {SimSearch})\) such that

$$ |\Pr [{\mathbf {Real}}^{\mathsf {SE}}(\kappa )=1]-\Pr [{\mathbf {Ideal}}^{\mathsf {SE}}_{\mathcal {L}_1,\mathcal {L}_2}(\kappa )=1]|\le \mathsf {{neg}}(\kappa )\,, $$

where experiments \({\mathbf {Real}}^{\mathsf {SE}}(\kappa )\) and \({\mathbf {Ideal}}^{\mathsf {SE}}_{\mathcal {L}_1,\mathcal {L}_2}(\kappa )\) are defined in Fig. 1 and where the randomness is taken over the random bits used by the algorithms of the SE scheme, the algorithms of the simulator and \(\mathsf {Adv}\).

Leakage Functions \(\mathcal {L}_1\) and \(\mathcal {L}_2\). As in prior work [6], \(\mathcal {L}_1\) and \(\mathcal {L}_2\) are leakage functions such that \(\mathcal {L}_1(\mathcal {D}_0)=|\mathcal {D}_0|=N\) and \(\mathcal {L}_2(w_i)\) leaks the access pattern size \(|\mathcal {D}(w_i)|\) and the search pattern of \(w_i\). Formally for a keyword \(w_i\) searched at time i, \(\mathcal {L}_2(w_i)\) is

$$\begin{aligned} \mathcal {L}_2(w_i)= \left\{ \begin{array}{ll} (|\mathcal {D}(w_i)|,j) &{} \text {if}\,\, w_i\,\, \hbox {was searched at time}\,\,{j<i} \\ (|\mathcal {D}(w_i)|,\bot ) &{} \text {if}\,\, w_i\,\,\hbox {was never searched before}\\ \end{array} \right. \,. \end{aligned}$$
(1)

2.2 Oblivious RAM

Oblivious RAM (ORAM), introduced by Goldreich and Ostrovsky [18] is a compiler that encodes the memory such that accesses on the compiled memory do not reveal access patterns on the original memory. Formal correctness and security definitions of ORAM are given in the Appendix. We give the definition for a read-only ORAM as this is needed in our scheme—the definition naturally extends for writes as well:

  • \(\langle \sigma , \mathsf {EM}\rangle \leftrightarrow \textsc {OramInitialize}\langle (1^\kappa , \mathsf {M}),1^\kappa \rangle \): \(\textsc {OramInitialize}\) takes as input security parameter \(\kappa \) and memory array \(\mathsf {M}\) of n values \((1,v_1),\ldots ,(n,v_n)\) of \(\lambda \) bits each and outputs secret state \(\sigma \) (for client), and encrypted memory \(\mathsf {EM}\) (for server).

  • \(\langle (v_i, \sigma '), \mathsf {EM}' \rangle \leftrightarrow \textsc {OramAccess}\langle (\sigma ,i), \mathsf {EM}\rangle \): \(\textsc {OramAccess}\) is a protocol between client and server, where the client’s input is secret state \(\sigma \) and an index i. Server’s input is encrypted memory \(\mathsf {EM}\). Client’s output is value \(v_i\) assigned to i and updated secret state \(\sigma '\). Server’s output is updated encrypted memory \(\mathsf {EM}'\).

Fig. 2.
figure 2

Offline two-choice allocation of m balls to n bins.

3 New Bounds for Offline Two-Choice Allocation

As mentioned in the introduction, our medium-list allocation uses a variation of the classic balls-in-bins problem, known as offline two-choice allocation—see Fig. 2. Assume m balls and n bins. In the selection phase, for the i-th ball, two bins \(a_i\) and \(b_i\) are chosen independently and uniformly at random. After selection, in a post-processing phase, the i-th ball is mapped to either bin \(a_i\) or \(b_i\) such that the maximum load is minimized. This assignment is achieved by a maximum flow algorithm [28] (for completeness, we provide this algorithm in Fig. 13 in the Appendix). The bin that ball i is finally mapped to is stored in an array \(\textsf {chosen}[i]\) whereas the other bin that was chosen for ball i is stored in an array \(\textsf {alternative}[i]\). Let \(L^{*}_\textsf {max}\) denote the maximum load across all bins after this allocation process completes. Sanders et al. [28] proved the following.

Lemma 1

(Sanders et al. [28]). Algorithm \({\mathsf {OfflineTwoChoiceAllocation}}\) in Fig. 2 outputs an allocation \({\mathsf {chosen}}\) of m balls to n bins such that \(L^{*}_{\mathsf {max}}>\lceil \frac{m}{n}\rceil +1\) with probability at most O(1 / n).Footnote 5 Moreover, the allocation can be performed in time \(O(n^3)\).

For our purposes, the bounds derived by Sanders et al. [28] do not suffice. In the following we derive new bounds. In particular:

  1. 1.

    In Sect. 3.1, we derive probability bounds on the number of overflowing bins, i.e., the bins that contain more than \(\lceil \frac{m}{n}\rceil +1\) balls after the allocation completes.

  2. 2.

    In Sect. 3.2, we derive probability bounds on the overflow size, i.e., the number of balls beyond \(\lceil \frac{m}{n}\rceil +1\) that a bin contains.

  3. 3.

    In Sect. 3.3, we combine these to bound the total number of overflowing balls.

3.1 Bounding the Number of Overflowing Bins

For every bin \(\ell \in [n]\), let us define a random 0-1 variable \(Z_\ell \) such that \(Z_\ell \) is 1 if bin \(\ell \) contains more than \(\lceil \frac{m}{n}\rceil +1\) balls after \(\mathsf {OfflineTwoChoiceAllocation}\) returns and 0 otherwise. What we want is to bound is the random variable \( Z = \sum _{\ell =1 }^nZ_i, \) representing the total number of overflowing bins. Unfortunately we cannot use a Chernoff bound directly, since (i) the variables \(Z_i\) are not independent; (ii) we do not know the exact expectation \(\mathbb {E}[Z]\). However, we observe that if we show that the variables \(Z_i\) are negatively associated (at a high level negative association indicates that for a set of variables, whenever some of them increase the rest tend to decrease—see the Appendix for a precise definition) and if we get an upper bound on \(\mathbb {E}[Z]\) we can then derive a Chernoff-like bound for the number of overflowing bins. We begin by proving the following.

Lemma 2

The set of random variables \(Z_1,Z_2,\ldots ,Z_n\) is negatively associated.

Proof

For all \(i\in [n]\), \(j\in [n]\) and \(k\in [m]\) let \(X_{ijk}\) be the random variable such that

$$\begin{aligned} X_{ijk} = {\left\{ \begin{array}{ll} \; 1 &{} \text {if}\,\, \mathsf {OfflineTwoChoiceAllocation} \,\,\hbox {chose the two bins} \,\,i\,\, \hbox {and}\,\, j\,\, \hbox {for ball}\,\, k\\ \; 0 &{} \text {otherwise} \end{array}\right. }. \end{aligned}$$

For each k it holds that \(\sum _{i,j}X_{ijk} =1\), since only one pair of bins is chosen for ball k. Therefore, by [17, Proposition 11], it follows that each set \(\mathbf {X}_k = \{X_{ijk}\}_{i\in [n],j\in [n]}\) is negatively associated. Moreover, since the sets \(\mathbf {X}_k,\mathbf {X}_{k'}\) for \(k\ne k'\) consist of mutually independent variables (as the selection of bins is made independently for each ball), it follows from [17, Proposition 7.1] that the set \(\mathbf {X} = \{X_{ijk}\}_{i\in [n],j\in [n],k\in [m]}\) is negatively associated. Now consider the disjoint sets \(U_{\ell }\) for \(\ell \in [n]\) defined as \(U_\ell = \{X_{ijk} \;|\; \textsf {chosen}[k]=\ell \;\wedge \; (\ell =i \vee \ell =j)\}\,,\) where \(\textsf {chosen}\) is the array output by \(\textsf {OfflineTwoChoiceAllocation}\). Let us now define \(h_\ell (X_{ijk}, X_{ijk}\in U_\ell ) = \sum _{X_{ijk}\in U_{\ell }} X_{ijk}\) for \(\ell \in [n]\). Clearly each \(h_\ell \) is a non-decreasing function and therefore by [17, Proposition 7.2] the set of random variables \(\mathbf {Y} = \{Y_\ell \}_{\ell \in [n]}\) where \(Y_\ell = h_\ell \) is also negatively associated. We can finally define \(Z_\ell \) for \(\ell =1,\ldots ,n\) as

$$\begin{aligned} Z_\ell = f(Y_\ell ) = {\left\{ \begin{array}{ll} 0 &{} \text {if }Y_\ell \le \lceil m/n \rceil + 1 \\ 1 &{} \text {otherwise} \end{array}\right. }. \end{aligned}$$

Since f is also a non-decreasing function (as whenever \(Y_\ell \) grows, \(Z_\ell =f(Y_\ell )\) may only increase) therefore, again by [17, Proposition 7.2], it follows that the set of random variables \(Z_1,Z_2,\ldots ,Z_n\) is also negatively associated.    \(\square \)

Lemma 3

The expected number of overflowing bins \(\mathbb {E}[Z]\) is O(1).

Proof

For all bins \(\ell \in [n]\), it is \(\mathbb {E}[Z_\ell ]=\Pr [Y_\ell> \lceil m/n \rceil + 1] \le \Pr [L^{*}_{\textsf {max}} > \lceil m/n \rceil + 1]=O(1/n)\), by Lemma 1 (where \(L^{*}_\textsf {max}\) is the maximum load across all bins after allocation). By linearity of expectation and since \(Z=\sum Z_i\), it is \(\mathbb {E}[Z]=O(1)\).    \(\square \)

Theorem 1

Assume \(\mathsf {{OfflineTwoChoiceAllocation}}\) from Fig. 2 is used to allocate m balls into n bins. Let Z be the number of bins that receive more than \(\lceil m/n\rceil +1\) balls. Then there exists a fixed positive constant c such that for sufficiently large nFootnote 6 and for any \(\tau >1\) we have \(\Pr [Z \ge c \cdot \tau ] \le \left( \frac{e}{\tau }\right) ^{c \cdot \tau }.\)

Proof

By Lemma 3 we have that there exists a fixed constant c such that \(\mathbb {E}[Z]\le c\) for sufficiently large n. Therefore, by Lemmas 2 and 8 in the Appendix (where we set \(\mu _H=c\) since \(\mathbb {E}[Z]\le c\)) we have that for any \(\delta >0\)

$$\begin{aligned} \Pr [Z \ge (1+\delta )\cdot c] \le \left( \frac{e^{\delta }}{(1+\delta )^{(1+\delta )}}\right) ^{c}\le \left( \frac{e^{1+\delta }}{(1+\delta )^{(1+\delta )}}\right) ^{c}\,. \end{aligned}$$

Setting \(\delta =\tau -1\) which is \(>0\) for \(\tau >1\), we get the desired result.    \(\square \)

3.2 Bounding the Overflow Size

Next, we turn our attention to the number of balls \(Y_\ell \) that can be assigned to bin \(\ell \). In particular, we want to derive a probability bound \(\Pr [Y_\ell > \lceil m/n \rceil + \tau ]\) defined in general for parameter \(\tau \ge 2\)—Sanders et al. [28] studied only the case where \(\tau =1\). To do that, we will bound the probability that after \(\textsf {OfflineTwoChoiceAllocation}\) returns the maximum load \(L^{*}_\textsf {max}\) is larger than \(\lceil m/n \rceil +\tau \) for \(\tau \ge 2\). We now prove the following.

Theorem 2

Assume \({\mathsf {OfflineTwoChoiceAllocation}}\) from Fig. 2 is used to allocate m balls into n bins. Let \(L^{*}_{\mathsf {max}}\) be the maximum load across all bins. Then for any \(\tau \ge 2\)

$$\begin{aligned} \Pr \left[ L^*_{{\mathsf {max}}} \ge \left\lceil \frac{m}{n}\right\rceil +\tau \right] \le O(1/n)^{\tau }+O(\sqrt{n}\cdot 0.9^{n})\,. \end{aligned}$$

Proof

Our analysis here closely follows the one of [28]. Without loss of generality, we assume the number of balls m to be a multiple of the number of bins nFootnote 7 and we will set \(b=m/n\). Let now \((a_i,b_i)\) be the two random choices that \(\textsf {OfflineTwoChoiceAllocation}\) makes for ball i where \(i=1,\ldots ,m\). For a subset \(U \subseteq \{1,\ldots ,n\}\) of bins we define the random variables \(X^{U}_1,\dots ,X^{U}_{m}\) such that \(X^{U}_{i} = 1\), if \(a_i,b_i \in U\), and 0 otherwise, i.e., \(X^{U}_i\) is 1 only if both selections for the i-th ball are from subset U, which unavoidably leads to this ball being assigned to a bin within subset U. The random variable \(L_U = \sum _{i=1}^mX^{U}_i\) is called the unavoidable load of U. Also, for a set U and a parameter \(\tau \), let \(P_{U}= \Pr [L_U \ge (b+\tau )|U| +1]\). Finally, let \(L^{*}_{\textsf {max}}\) be the optimal load, namely the minimum maximum load that can be derived by considering all possible allocations given the random choices \((a_1,b_1),\ldots ,(a_m,b_m)\). Since \(\mathsf {MaxFlowSchedule}\) computes an allocation with the optimal load, we must compute the probability \(\Pr [L^{*}_{\textsf {max}}>b+\tau ]\), where \(\tau \ge 2\). From [29, Lemma 5] we have \(L^*_{\textsf {max}} = \displaystyle \max _{\emptyset \ne U\subseteq \{1,\dots ,n\}}\left\{ L_U/|U|\right\} \). Thus,

$$\begin{aligned} \Pr [L^*_{\mathsf {max}}> b+\tau ] =&\Pr [\exists U \subseteq [n]: L_U/|U| > b+\tau ]\\ \le&\textstyle \sum _{\emptyset \ne U\subseteq [n]}\Pr [L_U \ge (b+\tau )|U| +1] =\sum _{|U|=1}^n \left( {\begin{array}{c}n\\ |U|\end{array}}\right) P_{U}\,, \end{aligned}$$

where the inequality follows from a simple union bound and for the last step we used the fact that \(P_U\) is the same for all sets U of the same cardinality. This is because for all sets \(U_1\) and \(U_2\) with \(|U_1|=|U_2|\) we have that \(\Pr [L_{U_1} \ge (b+\tau )|U_1| +1]=\Pr [L_{U_2} \ge (b+\tau )|U_2| +1]\) since \(U_1\) and \(U_2\) are identically distributed. Next, we need to bound the sum \(\sum _{|U|=1}^n \left( {\begin{array}{c}n\\ |U|\end{array}}\right) P_{U}\). For this we will split the sum into three separate summands

$$\begin{aligned} T_1 = \displaystyle \sum _{1\le |U| \le \frac{n}{8}} \left( {\begin{array}{c}n\\ |U|\end{array}}\right) P_{U},\text { } T_2 = \displaystyle \sum _{\frac{n}{8}< |U| <\frac{nb}{b+\tau }} \left( {\begin{array}{c}n\\ |U|\end{array}}\right) P_{U} \text { and } T_3 = \displaystyle \sum _{\frac{nb}{b+\tau }\le |U| \le n } \left( {\begin{array}{c}n\\ |U|\end{array}}\right) P_{U}. \end{aligned}$$

We begin with the simple observation that \(T_3 = 0\). To see why, note that for \(|U| \ge nb/(b+\tau )\) it holds that \(P_{U} = \Pr [L_U \ge (b+\tau )|U| +1] = \Pr [L_U \ge (b+\tau )nb/(b+\tau ) +1] = \Pr [L_U \ge m +1] = 0\) as m is a natural upper bound for \(L_U\) (i.e., if both selections fall within U for all balls). Regarding \(T_2\), from [28, Lemma 9] we have \(\sum _{\frac{n}{8}< |U| <\frac{nb}{b+1}} \left( {\begin{array}{c}n\\ |U|\end{array}}\right) P^*_{U} = O(\sqrt{n}\cdot 0.9^{n})\), where \(P^*_{U} = \Pr [L_U \ge (b+1)|U| +1]\). Clearly, for all U, \(P_{U} \le P^*_{U}\). Moreover, \(\sum _{\frac{n}{8}< |U|<\frac{nb}{b+\tau }}P^*_{U} \le \sum _{\frac{n}{8}< |U| <\frac{nb}{b+1}}P^*_{U} \) for all \(\tau \ge 2\). Putting it all together, we have

$$\begin{aligned} T_2 \le \sum _{\frac{n}{8}< |U| <\frac{nb}{b+1}} \left( {\begin{array}{c}n\\ |U|\end{array}}\right) P^*_{U} = O(\sqrt{n}\cdot 0.9^{n}). \end{aligned}$$

By Lemma 9 in the Appendix, \(T_1=O(1/n)^{b+\tau }=O(1/n)^{b+\tau }\) for all \(\tau \ge 2\) hence for all \(\tau \ge 2\) it is \(\sum _{|U|=1}^n \left( {\begin{array}{c}n\\ |U|\end{array}}\right) P_{U}=O(1/n)^{\tau }\), as \(b\ge 0\), which completes the proof.    \(\square \)

3.3 Bounding the Total Number of Overflowing Balls

Let \(T>0\) be the number of overflowing balls, i.e., \(T = \sum _{i=1}^\ell Z_i(Y_i-\lceil m/n\rceil -1)\). Using Theorems 1 and 2, and by a simple application of the law of total probability, we can now prove the following result.

Theorem 3

Assume \({\mathsf {OfflineTwoChoiceAllocation}}\) from Fig. 2 is used to allocate m balls into n bins. Let T be the number of overflowing balls as defined above. Then there exist positive constants \(c, c_1, c_2\) such that for large n and for any \(\tau \ge 2\) it is

$$ \Pr [T>c\cdot \tau ^2] \le \left( \frac{e}{\tau }\right) ^{c\cdot \tau }+\left( \frac{c_1}{n}\right) ^{\tau }+c_2\sqrt{n}\cdot 0.9^{n}\,. $$

Proof

Define the events \(E: T>c\cdot \tau ^2\), \(E_1: Z > \tau \) and \(E_2: L^*_{\textsf {max}} > \lceil m/n \rceil +\tau \), for some \(\tau \ge 2\). There is no way there can be more than \(\tau ^2\) overflowing balls if both the number of overflowing bins and the maximum overflow per bin is at most \(\tau \). This implies that \(E \subseteq E_1 \cup E_2\). By a standard union bound and applying Theorems 1 and 2, we have \(\Pr [E]\le \left( \frac{e}{\tau }\right) ^{c\cdot \tau }+O(1/n)^{\tau }+O(\sqrt{n}\cdot 0.9^{n})\), which completes the proof by taking \(c_1\) and \(c_2\) to be the constants in O(1 / n) and \(O(\sqrt{n}\cdot 0.9^{n})\) respectively.    \(\square \)

4 New ORAM with O(1) Locality and \(o(\sqrt{n})\) Bandwidth

Our constant-locality SE construction uses an ORAM scheme as a black box. In particular, the ORAM scheme that is used must have the following properties:

  1. 1.

    It needs to have constant locality, meaning that for each oblivious access it should only read O(1) non-contiguous locations in the encrypted memory. Existing ORAM constructions with polylogarithmic bandwidth have logarithmic locality. For example, a path ORAM access [33] traverses \(\log n\) binary tree nodes stored in non-contiguous memory locations—therefore we cannot use it here. This property is required as our underlying SE scheme must have O(1) locality;

  2. 2.

    It needs to have bandwidth cost \(o(\sqrt{n} \cdot \lambda )\). This property is required because we would be applying the ORAM scheme on an array of \(O(\log ^2 N)\) entries, yielding overall bandwidth equal to \(o(\log N \cdot \lambda )\), which would imply sublogarithmic read efficiency for the underlying SE scheme.

We note here that an existing scheme that almost satisfies both properties above is the ORAM construction from [27, Theorem 7] by Ohrimenko et al. (where we set \(c=3\)). This ORAM has O(1) locality and \(O(n^{1/3}\log n \cdot \lambda )\) bandwidth. However we cannot apply it here due to its failure probability which is \(\mathsf {neg}(n)\), where n is the size of the memory array. Unfortunately, since our array has \(O(\log ^2 N)\) entries (N is the size of the SE dataset), this gives a probability of failure \(\mathsf {neg}(\log ^2 N)\) which is not \(\mathsf {neg}(N)\).

Our proposed ORAM construction is a hierarchical application of the square-root ORAM construction of Goldreich and Ostrovsky [18]. Here, we provide a description of the amortized version of our construction (i.e., the read-efficiency and locality bounds we achieve are amortized over n accesses) in Fig. 3. The deamortized version of our ORAM construction is achieved using techniques of Goodrich et al. [21] for deamortizing the square root ORAM, in a straight-forward manner (formal description and analysis of the deamortized version can be found in the extended version [13]).

ORAM Setup. Given memory \(\mathsf {M}\) with n index-value pairs \((1,v_1),\ldots ,(n,v_n)\) we allocate three main arrays for storage: A of size \(n_a=n+n^{2/3}\), B of size \(n_b=n^{2/3}+n^{1/3}\), and C of size \(n_c=n^{1/3}\). Initially A stores all elements encrypted with CPA-secure encryption and permuted with a pseudorandom permutationFootnote 8 \(\pi _a: [n_a]\rightarrow [n_a]\) and B and C are empty, containing encryptions of dummy values. We also initialize another pseudorandom permutation \(\pi _b: [n_b]\rightarrow [n_b]\) used for accessing elements from array B. In particular, if an element \(x\in [n]\) is stored in array B, it is located at position \(\pi _b[\mathsf {Tab}[x]]\) of B, where \(\mathsf {Tab}\) is a locally-stored hash table mapping an element \(x\in [n]\) to \(\mathsf {Tab}[x]\in [n_b]\). Note the hash table is needed to index elements in B as \(n_b<n\).

ORAM Access. To access element x, the algorithm always downloads, decrypts and sequentially scans array C. Similarly to the square-root ORAM, we consider two cases:

  1. 1.

    Element x is in C. In this case the requested element has been found and the algorithm performs two additional dummy accesses for security reasons: it accesses a randomFootnote 9 position in array A and a random position in array B.

  2. 2.

    Element x is not in C. In this case we distinguish the following subcases.

    • Element x is not in B.Footnote 10 In this case x can be retrieved by accessing the random position \(\pi _a[x]\) of array A. Like previously, the algorithm also accesses a random position in array B.

    • Element x is in B. In this case x can be retrieved by accessing the random position \(\pi _b[\mathsf {Tab}[x]]\) of array B. Like previously, the algorithm also accesses a random position in array A.

After the access above, the retrieved element x is written in the next available position of C, the algorithm computes a fresh encryption of C and writes C back to the server. Just like in square-root ORAM, some oblivious reshuffling must occur: In particular, every \(n^{1/3}\) accesses, array C becomes full and both C and the contents of B are obliviously reshuffled into B. Every \(n^{2/3}\) accesses, when B becomes full, all elements are obliviously reshuffled into A. We describe this reshuffling process next.

Reshuffling, epochs and superepochs. Our algorithm for obliviously accessing an element x described proceeds in epochs and superepochs. An epoch is defined as a sequence of \(n^{1/3}\) accesses. A superepoch is defined as a sequence of \(n^{2/3}\) accesses.

At the end of every epoch C becomes full, and all elements in C along with the ones that have been accessed in the current superepoch (and are now stored in B) are obliviously reshuffled into B using a fresh pseudorandom permutation \(\pi _b\). In our implementation in Fig. 3, we store all the elements that must be reshuffled in an array \(\mathsf {SCRATCH}\). After the reshuffling C can be emptied (denoted with \(\bot \) Line 30) so that it can be used again in the future. At the end of every superepoch all the elements of the dataset are obliviously reshuffled into array A using a fresh pseudorandom permutation \(\pi _a\) and arrays B, C and \(\mathsf {SCRATCH}\) are emptied.

Oblivious Sorting with Good Locality. As in previous works, our reshuffling in the ORAM protocol is performed using an oblivious sorting protocol. Since we are using the ORAM scheme in an SE scheme that must have good locality, we must ensure that the oblivious sorting protocol used has good locality as well, i.e., it does not access too many non-contiguous locations. One way to achieve that is to download the whole encrypted array, decrypt it, sort it and encrypt it back. This has excellent locality \(L=1\) but requires linear client space. A standard oblivious sorting protocol such as Batcher’s odd-even mergesort [8] does not work either since its locality can be linear.

Fig. 3.
figure 3

Read-only ORAM construction with \(O(n^{1/3}\log ^2 n \cdot \lambda )\) amortized bandwidth and O(1) amortized locality.

Fortunately, Goodrich and Mitzenmacher [20] developed an oblivious sorting protocol for an external memory setting that is a perfect fit for our application—see Fig. 16 in the Appendix. The client interacts with the server only by reading and writing b consecutive blocks of memory. We call each b-block access (either for read or write) an I/O operation. The performance of their protocol is characterized in the following theorem.

Theorem 4

(Goodrich and Mitzenmacher [20], Goodrich [19]). Given an array X containing n comparable blocks, we can sort X with a data-oblivious external-memory protocol that uses \(O((n/b)\log ^2 (n/b)\) I/O operations and local memory of 4b blocks, where an I/O operation is defined as the read/write of b consecutive blocks of X.

In the above oblivious sorting protocol, value b (the number of consecutive blocks downloaded/uploaded in one I/O) can be parameterized, affecting the local space accordingly. In our case, we set b to be equal to \(n^{1/3}\log ^2 n\)—see Lines 24 and 29 in Fig. 3, which is enough for achieving constant locality in our SE scheme.

Our final result is as follows (proof can be found in the Appendix).

Theorem 5

Let n be the size of the memory array and \(\lambda \) be the size of the block. Our ORAM scheme (i) is correct according to Definition 2; (ii) is secure according to Definition 3, assuming pseudorandom permutations and CPA-secure encryption; (ii) has \(O(n^{1/3}\log ^2 n \cdot \lambda )\) amortized bandwidth and O(1) amortized locality per access and requires client space \(O(n^{2/3}\log n+n^{1/3}\log ^2 n \cdot \lambda )\).

Standard deamortization techniques from [21] can be applied to make the overheads of our ORAM worst-case as opposed to amortized. A formal treatment of this is presented in the extended version of our paper [13], giving the following result.

Corollary 1

Let \(\lambda =\varOmega (n^{1/3})\) bits be the block size. Then our ORAM scheme has \(O(n^{1/3}\log ^2 n \cdot \lambda )\) worst-case bandwidth per access, O(1) worst-case locality per access and \(O(n^{1/3}\log ^2 n \cdot \lambda )\) client space.

5 Allocation Algorithms

As we mentioned in the introduction, to construct our final SE scheme we are going to use a series of allocation algorithms. The goal of an allocation algorithm for an SE dataset \(\mathcal {D}\) consisting of q keyword lists \(\mathcal {D}(w_1),\mathcal {D}(w_2),\ldots ,\mathcal {D}(w_q)\) is to store/allocate the elements of all lists into an array \(\mathbf A \) (or multiple arrays).

Fig. 4.
figure 4

Allocation algorithm for small sizes from Asharov et al. [6].

Retrieval Instructions. To be useful, an allocation algorithm should also output a hash table \(\mathsf {Tab}\) such that \(\mathsf {Tab}[w]\) contains “instructions” on how to correctly retrieve a keyword list \(\mathcal {D}(w)\) after the list is stored. For example, for a keyword list \(\mathcal {D}(w)\) that contains four elements stored at positions 5, 16, 26, 27 of \(\mathbf A \) by the allocation algorithm, some valid alternatives for the instructions \(\mathsf {Tab}[w]\) are: (i) “access positions 5, 16, 26, 27 of array A”; (ii) “access all positions from 3 to 28 of array A”; (iii) “access the whole array A”. Clearly, there are different tradeoffs among the above.

Independence Property. For security purposes, and in particular for simulating the search procedure of the SE scheme, it is important that the instructions \(\mathsf {Tab}[w]\) output by an allocation algorithm for a keyword list \(\mathcal {D}(w)\) are independent of the distribution of the rest of the dataset—intuitively this implies that accessing \(\mathcal {D}(w)\) does not reveal information about the rest of the data. This independence property is easy to achieve with a “read-all” algorithm, where the whole array is read every time a keyword is accessed, but this is very inefficient. Another way to achieve this property is to store the lists using a random permutation \(\pi \)—this is actually the allocation algorithm used by most existing SE schemes, e.g., [12]. This “permute” approach has however very bad locality since it requires \(|\mathcal {D}(w)|\) random jumps in the memory to retrieve \(\mathcal {D}(w)\). In the following we present the details of our allocation algorithms for small, medium, large and huge lists. We begin with some terminology.

5.1 Buckets and Superbuckets

Following terminology from [6], our allocation algorithms use fixed-capacity buckets for storage. A bucket with capacity C can store up to C elements—in our case an element is a keyword-document pair (wid). To simplify notation, we represent a set of B buckets \(A_1,A_2,\ldots ,A_B\) as an array \(\mathbf A \) of B buckets, referring to bucket \(A_i\) as \(\mathbf A [i]\). Additionally, a superbucket \(\mathbf A \{k,s\}\) is a set of the following s consecutive buckets

$$\begin{aligned} \mathbf A [(k-1) s+1],\mathbf A [(k-1) s+2],\ldots ,\mathbf A [k s]\,. \end{aligned}$$

We say that we store a keyword list \(\mathcal {D}(w)=\{(w,id_1),(w,id_2),\ldots ,(w,id_s)\}\) horizontally into superbucket \(\mathbf A \{k,s\}\) when each element \((w,id_i)\) is stored in a separate bucket of the superbucket.Footnote 11 Finally, the load of a bucket or a superbucket is the number of elements stored in each bucket or superbucket.

5.2 Allocating Small Lists with Two-Dimensional Allocation

For small keyword lists we use the two-dimensional allocation algorithm of Asharov et al. [6], by carefully setting the parameters from scratch. For completeness we provide the algorithm in Fig. 4, which we call AllocateSmall. Let \(C=c_s\cdot \log ^{\gamma }N\), for some appropriately chosen constant \(c_s\). The algorithm uses \(B=N/C\) buckets of capacity C each. It then considers all small keyword lists starting from the largest to the smallest, and depending on the list’s size s, it picks two superbuckets from \(\{1,2\ldots ,B/s\}\) uniformly at random, horizontally placing the keyword list into the superbucket with the minimum load. The algorithm records both superbuckets as instructions in a hash table \(\mathsf {Tab}\). If, during this allocation process some bucket overflows, then the algorithm fails. We now have the following result.

Theorem 6

Algorithm \({\mathsf {AllocateSmall}}\) in Fig. 4 outputs \({\mathsf {FAIL}}\) with probability \(\mathsf {neg}(N)\). Moreover the output array of buckets \(\mathbf {A}\) occupies space O(N).

Proof

  For the algorithm to fail, the load of some bucket \(\mathbf {A}[i]\) (i.e., maximum load) must exceed \(O(\log ^{\gamma }N)\). We show this probability is negligible for our choice of \(\gamma =2/3+\delta \). We recall \(\mathsf {AllocateSmall}\) allocates all keyword lists using a two-dimensional balanced allocation [6]. For our proof we apply [6, Theorem 3.5] that states: For \(\mathsf {max}=N^{1-\epsilon }\), \(B\ge N/\log N\) and for non-decreasing function f(N) such that \(f(N)=\varOmega (\log \log N)\), \(f(N)=O(\sqrt{\log N})\) and \(f(2N)=O(f(N))\) the maximum load of a two-dimensional balanced allocation is \(\frac{4N}{B}+O(\log \epsilon ^{-1}\cdot f(n))\) with probability at least \(1-O( \log \epsilon ^{-1})\cdot N^{-\varOmega (\epsilon \cdot f(N^{\epsilon }))}\). In our case, it \(\epsilon =1/\log ^{1-\gamma }N\) and \(B=N/\log ^{\gamma }N\) and we also pick \(f(N)=\sqrt{\log N}\). Note that all conditions for f and B and \(\epsilon \) of  [6, Theorem 3.5] are satisfied assuming \(1/2<\gamma < 1\). Also, for this choice of parameters we have that the probability the maximum load is more than \(O(\log ^{\gamma } N)\) is at most \(O(\log (\log ^{1-\gamma }N))\cdot N^{-\varOmega (\ell (N))}\) where \(\ell (N)\) is

$$\begin{aligned} \frac{1}{\log ^{1-\gamma }N}\sqrt{\log \left( N^{1/\log ^{1-\gamma }N}\right) }=\frac{\sqrt{\log ^\gamma N}}{\log ^{1-\gamma }N}=\log ^{3\gamma /2-1} N\,. \end{aligned}$$

Since our construction uses \(\gamma =2/3\,+\,\delta \) for any small \(\delta >0\) it is always \(3\gamma /2-1> 0\) and therefore the above probability is negligible.    \(\square \)

Note now that a list of size s can be read by accessing s consecutive buckets (i.e., a superbucket), therefore the read efficiency for these lists is \(O(\log ^{\gamma } N)\).

5.3 Allocating Medium Lists with OTA

The allocation process for medium lists is shown in Fig. 5 and the algorithm is described in Fig. 6. The algorithm uses an array A of \(B=N/\log ^{\gamma } N\) buckets, where each bucket has capacity \(C=3\cdot \log ^{\gamma } N\). Just like \(\mathsf {AllocateSmall}\), the allocation algorithm for medium sizes stores a list \(\mathcal {D}(w)\) of size s horizontally into one of the superbuckets \(\mathbf A \{1,s\},\mathbf A \{2,s\},\ldots ,\mathbf A \{B/s,s\}\).

However, unlike \(\mathsf {AllocateSmall}\), the superbucket that is finally chosen to store \(\mathcal {D}(w)\) depends only on keyword lists of the same size with \(\mathcal {D}(w)\) that have already been allocated and not on all other keyword lists encountered so far. In particular, let \(k_s\) be the number of keyword lists that have size s. Let also \(b_s=B/s\) be the number of superbuckets with respect to size s. To figure out which superbucket to pick for horizontally storing a particular keyword list of size s, the algorithm views the \(k_s\) keyword lists as balls and the \(b_s\) superbuckets as bins and performs an offline two-choice allocation of \(k_s\) keyword lists (balls) into \(b_s\) superbuckets (bins), as described in Sect. 3. When, during this process some superbucket contains \( \lceil k_s/b_s\rceil +1\) keyword lists of size s, any subsequent keyword list of size s meant for this superbucket is instead placed into a stash \(\mathbf B _s\) that contains exactly \(c\cdot \log ^2 N\) buckets of size s each for some fixed constant c derived in Theorem 1. Our algorithm will fail, if

  • Some bucket \(\mathbf A [i]\) overflows (i.e., the number of elements that are eventually stored in \(\mathbf A [i]\) exceeds its capacity C), which as we show in Lemma 4 never happens; or

  • More than \(c\cdot \log ^2 N\) keyword lists of size s must be stored at some stash \(\mathbf B _s\), which as we show in Lemma 5 happens with negligible probability.

All the choices that the algorithm makes, such as the two superbuckets originally chosen for every list during the offline two-choice allocation as well as the position in the stash (in case the list was an overflowing one) are recorded in \(\mathsf {Tab}\) as retrieval instructions. We now prove the following lemma.

Fig. 5.
figure 5

Allocation of medium lists. Each ball represents a list of size \(N^{1-1/\log ^{1-\gamma }N}\). Two balls chained together represent a keyword list of double the size and so on. Arrays \(\mathbf A _i\) show the OTA assignments for all lists of a specific size i. Arrays \(\mathbf A _i\) are merged into array \(\mathbf A \) of M buckets of capacity \(O(\log ^{\gamma }N)\) each. Overflowing lists of size i are placed in the stash \(\mathbf B _i\). Only light-gray arrays are stored at the server—white arrays are only used for illustrating the intermediate results.

Fig. 6.
figure 6

Allocation algorithm for medium sizes.

Lemma 4

During the execution of algorithm \({\mathsf {AllocateMedium}}\) in Fig. 6, no bucket \(\mathbf {{A}}[i]\) (for all \(i=1,\ldots ,B\)) will ever overflow.

Proof

For each size \(s=2\mathsf {min},4\mathsf {min},\ldots ,\mathsf {max}\), Line 15 of \(\mathsf {AllocateMedium}\) allows at most \(\lceil k_s/b_s\rceil +1\) keyword lists of size s to be stored in any superbucket \(\mathbf A \{i,s\}\). Since every keyword list of size s is stored horizontally in a superbucket \(\mathbf A \{i,s\}\), it follows that every bucket \(\mathbf A [i]\) within every superbucket \(\mathbf A \{i,s\}\) will have load, due to keywords lists of size s, at most \(s\cdot (\lceil k_s/b_s\rceil +1)/s=\lceil k_s/b_s\rceil +1\). Therefore the total load of a bucket \(\mathsf {A}[i]\) due to all sizes \(s=2\mathsf {min},4\mathsf {min},\ldots ,\mathsf {max}\) is at most \( \sum _{s}\left( \left\lceil \frac{k_s}{b_s}\right\rceil +1\right) \le \sum _{s}\frac{k_s}{b_s} + \sum _{s}2. \) We now bound the above sums separately. Since \(b_s=B/s\), \(\sum _s k_s\cdot s \le N\) and \(B=N/\log ^{\gamma } N\) it is \( \sum _{s}\frac{k_s}{b_s}=\frac{1}{B}\sum _{s}k_s\cdot s\le \frac{N}{B}=\log ^{\gamma } N \). As \(\mathsf {min}=2^{\log N-\log ^{\gamma } N+1}\), \(\mathsf {max}=N/\log ^2 N=2^{\log N-2\log \log N}\) and size s takes only powers of 2, there are at most \(\log ^{\gamma } N-2\log \log N\) terms in the sum \(\sum _s 2\) and therefore \( \sum _{s}\left( \left\lceil \frac{k_s}{b_s}\right\rceil +1\right) \le 3\cdot \log ^{\gamma } N-4\cdot \log \log N \le 3\cdot \log ^{\gamma } N \), which equals the bucket capacity C in \(\mathsf {AllocateMedium}\). Thus no bucket will ever overflow.    \(\square \)

Lemma 5

During the execution of algorithm \({\mathsf {AllocateMedium}}\) in Fig. 6, no stash \(\mathbf {{B}}_s\) (for \(s=2{\mathsf {min}},4{\mathsf {min}},\ldots ,{\mathsf {max}}\)) will ever overflow, except with probability \(\mathsf {neg}(N)\).

Proof

Recall that for each \(s=2\mathsf {min},4\mathsf {min},\ldots ,\mathsf {max}\), placing the \(k_s\) keyword lists of size s into the \(b_s\) superbuckets of size s is performed via an offline two-choice allocation of \(k_s\) balls into \(b_s\) bins. Also recall that the lists that end up in the stash \(\mathbf B _s\) (that has capacity \(\log ^2 N\)) are originally placed by the allocation algorithm in superbuckets containing more than \(\lceil k_s/b_s \rceil +1\) keyword lists of size s, thus they are overflowing. Let \(T_s\) be the number of these lists. By Theorem 3, where we set \(T=T_s\) and \(n=b_s\) and \(\tau =\log N\), we have that for large \(b_s\) and for fixed constants c, \(c_1\) and \(c_2\)

$$\begin{aligned} \Pr [T_s>c\cdot \log ^2 N] \le \left( \frac{e}{\log N}\right) ^{c\cdot \log N}+\left( \frac{c_1}{b_s}\right) ^{\log N}+c_2\sqrt{b_s}\cdot 0.9^{b_s}=\mathsf {neg}(N)\, , \end{aligned}$$

as \(b_s=B/s=N/s\log ^{\gamma } N \ge \log ^{2-\gamma } N=\omega (\log N)\) as \(s\le \mathsf {max}=N/\log ^2 N\).    \(\square \)

Theorem 7

Algorithm \({\mathsf {AllocateMedium}}\) in Fig. 6 outputs \({\mathsf {FAIL}}\) with probability \(\mathsf {neg}(N)\). Moreover, the size of the output array A and the stashes \(\mathcal {B}\) is O(N).

Proof

\(\mathsf {AllocateMedium}\) can fail either because a bucket \(\mathbf A [i]\) overflows, which by Lemma 4 happens with probability 0, or because some stash \(\mathbf B _s\) ends up having to store more than \(\log ^2 N\) elements for some \(s=2\mathsf {min},4\mathsf {min},\ldots ,\mathsf {max}\), which by Lemma 5 happens with probability \(\mathsf {neg}(N)\). For the space complexity, since no bucket \(\mathbf A [i]\) overflows, array \(\mathbf A \) occupies space O(N). Also each stash \(\mathbf B _s\) contains \(\log ^2 N\) buckets of size s each so the total size required by the stashes is \( c\cdot \log ^2 N(\mathsf {min}+2\mathsf {min}+4\mathsf {min}+\ldots +\mathsf {max})\). Since \(\mathsf {max}=N/\log ^2 N\), the above is \(\le 2 c \log ^2 N \mathsf {max} = O(N)\).    \(\square \)

Fig. 7.
figure 7

Allocation algorithm for large sizes.

5.4 Allocating Large Lists

Recall that we call a keyword list large, if its size is in the range \(N/\log ^2 N\) and \(N/\log ^{\gamma } N\) (recall \(\gamma =2/3+\delta \)). Algorithm \(\mathsf {AllocateLarge}\) in Fig. 7 is used to allocate lists whose size falls within a specific subrange \((\mathsf {min},\textsf {max}]\) of the above range. Let \(\mathsf {step}\) be an appropriately chosen parameter such that \(\mathsf {step} < 3\delta /2\) and partition the range \((N/\log ^2 N,N/\log ^{\gamma }N]\) into \(\frac{2-\gamma }{\mathsf {step}}\) consecutive subrangesFootnote 12

$$\begin{aligned} \left( \frac{N}{\log ^2 N},\frac{N}{\log ^{2-\mathsf {step}}N}\right] ,\left( \frac{N}{\log ^{2-\mathsf {step}} N},\frac{N}{\log ^{2-2 \cdot \mathsf {step}}N}\right] ,\ldots ,\left( \frac{N}{\log ^{\gamma - \mathsf {step}} N},\frac{N}{\log ^{\gamma }N}\right] . \end{aligned}$$

For a given subrange \((\mathsf {min},\mathsf {max}]\), \(\mathsf {AllocateLarge}\) stores all keyword lists in an array \(\mathbf A \) of \(t=N/\mathsf {max}\) buckets of capacity \(2\mathsf {max}\) each. In particular, for a large keyword list \(\mathcal {D}(w)\) of size s, the algorithm places the list in the first bucket that it can find with available space. We later prove that there will always be such a bucket, and therefore no overflow will ever happen. The formal description of the algorithm is shown in Fig. 7.

Theorem 8

Algorithm \({\mathsf {AllocateLarge}}\) in Fig. 7 never outputs \({\mathsf {FAIL}}\).

Proof

Assume \(\mathsf {AllocateLarge}\) fails. This means that at the time some list \(\mathcal {D}(w)\) is considered, all buckets of \(\mathbf A \) store at least \(2\mathsf {max}-s+1\) elements each. Therefore the total number of elements considered so far is \( \frac{N}{\mathsf {max}}(2\mathsf {max}-s+1)\ge \frac{N}{\mathsf {max}}(\mathsf {max}+1)\ge N+\frac{N}{\mathsf {max}}\ge N+\log ^{\gamma }N, \) since \(s\le \mathsf {max}\le N/\log ^{\gamma }N\). This is a contradiction, however, since the number of entries of our dataset is exactly N.    \(\square \)

5.5 Allocating Huge Lists with a Read-All Algorithm

Keyword lists that have size greater than \(N/\log ^{\gamma } N\) up to N are stored directly in an array \(\mathbf A \) of N entries, one after the other—see Fig. 8. To read a huge list in our actual construction, one would have to read the whole array \(\mathbf A \)—however, due to the huge size of the list, the read efficiency would still be small.

Fig. 8.
figure 8

Allocation algorithm for huge sizes.

6 Our SE Construction

We now present our main construction that uses the ORAM scheme presented in Sect. 4 and the allocation algorithms presented in Sect. 5 as black boxes. Our formal protocols are shown in Figs. 9 and 10.

6.1 Setup Protocol of SE Scheme

Our setup algorithm allocates lists depending on whether they are small, medium, large or huge, as defined in Sect. 5. We describe the details below.

Small Keyword Lists. These are allocated to superbuckets using \(\mathsf {AllocateSmall}\) from Sect. 5.2. The allocation algorithm outputs an array of buckets \(\mathbf S \) storing the small keyword lists and the instructions hash table \(\mathsf {Tab}_S\) storing, for each small keyword list \(\mathcal {D}(w)\), its size s and the superbuckets \(\alpha \) and \(\beta \) assigned for this keyword list by the allocation algorithm. The setup protocol of the SE scheme finally encrypts and writes bucket array \(\mathbf S \) and stores it remotely—see Line 5 in Fig. 9. It stores \(\mathsf {Tab}_S\) locally.

Medium Keyword Lists. These are allocated to superbuckets using \(\mathsf {AllocateMedium}\) from Sect. 5.3. \(\mathsf {AllocateMedium}\) outputs (i) an array of buckets \(\mathbf M \); (b) the set of stashes \(\{\mathbf {B}_s\}_s\) that handle the overflows, for all sizes s in the range; (iii) the instructions hash table \(\mathsf {Tab}_M\) storing, for each keyword list \(\mathcal {D}(w)\) that falls into this range, its size s, the superbuckets \(\alpha \) and \(\beta \) assigned for this keyword list and a stash position x in the stash \(\mathbf B _s\) where the specific keyword list could have been potentially stored, had it caused an overflow (otherwise a dummy position is stored). The setup protocol finally encrypts and writes \(\mathbf M \) and stores it remotely—see Line 8 in Fig. 9. It also builds an ORAM per stash \(\mathbf B _s\)—see Line 15 in Fig. 9. Finally, it stores \(\mathsf {Tab}_M\) locally.

Large Keyword Lists. These are allocated to buckets using \(\mathsf {AllocateLarge}\) from Sect. 5.4. To keep read efficiency small, we run \(\mathsf {AllocateLarge}\) for \(\frac{2-\gamma }{\mathsf {step}}\) distinct subranges, as we detailed in Sect. 5. For the subrange of \((N/\log ^{2-(h-1)\cdot \mathsf {step}}N,\) \(N/\log ^{2-h\cdot \mathsf {step}}N]\), \(\mathsf {AllocateLarge}\) outputs an array of buckets \(\mathbf L _h\) and a hash table \(\mathsf {Tab}_\mathbf{L _h}\). The setup protocol builds an ORAM for the array \(\mathbf L _h\) and it stores \(\mathsf {Tab}_\mathbf{L _h}\) locally.

Huge Keyword Lists. For these lists, we use \(\mathsf {AllocateHuge}\) from Sect. 5.5. This algorithm outputs an array \(\mathbf H \) and a hash table \(\mathsf {Tab}_H\). Our setup protocol encrypts and writes \(\mathbf H \) remotely and stores \(\mathsf {Tab}_H\) locally.

Local State and Using Tokens. For the sake of simplicity and readability of Fig. 9, we assume that the client keeps locally the hash table \(\mathsf {Tab}\)—see Line 13. This occupies linear space O(N) but can be securely outsourced using standard SE techniques [31], and without affecting the efficiency (read efficiency and locality): For every hash table entry \(w\rightarrow [s,\alpha ,\beta ,x]\), store at the server the “encrypted” hash table entry \( t_w\rightarrow \mathsf {ENC}_{k_w}(s||\alpha ||\beta ||x) \), where \(t_w\) and \(k_w\) comprise the tokens for keyword w (these are the outputs of a PRF applied on w with two different secret keys that the client stores) and ENC is a CPA-secure encryption scheme. To search for keyword w, the client just needs to send to the server the tokens \(t_w\) and \(k_w\) and the server can then search the encrypted hash table and retrieve the information \(s||\alpha ||\beta ||x\) by decrypting.

Fig. 9.
figure 9

The setup protocol of our SE construction.

Fig. 10.
figure 10

The search protocol of our SE construction.

Handling ORAM State and Failures. Our setup protocol does not store locally the ORAM states \(\sigma _s\) and \(\sigma _h\) of the stashes \(\mathbf B _s\) and the arrays \(\mathbf L _h\) for which we build an ORAM. Instead, it encrypts and writes them remotely and downloads them when needed—see Line 17 in Fig. 9. Also, our setup algorithm fails whenever any of the allocation algorithms fail. By Theorems 67 and 8 we have the following:

Lemma 6

Protocol \(\textsc {Setup}\) in Fig. 9 fails with probability \(\mathsf {neg}(N)\).

Lemma 7

Protocol \(\textsc {Setup}\) in Fig. 9 outputs an encrypted index \(\mathcal {I}\) that has O(N) size and runs in O(N) time.

Proof

The space complexity follows from Theorems 6 and 7, by the fact that array \(\mathbf H \) output by \(\textsf {AllocateHuge}\) has size O(N), by the fact that we keep a number of arrays for large keyword lists that is independent of N, and by the fact that the ORAM states \(\sigma _s\) and \(\sigma _h\), being asymptotically less than the ORAM themselves, occupy at most linear space. For the running time, note that AllocateSmall, AllocateLarge, AllocateHuge run in linear time and the ORAM setup algorithms also run in linear time (same analysis with the space can be made). By Lemma 1, AllocateMedium must perform a costly \(O(n^3)\) offline allocation (a maximum flow computation) where n is the number of superbuckets defined for every size s in the range. The maximum number of superbuckets M is achieved for the smallest size handled by AllocateMedium and is equal to \(M=\frac{N}{N^{1-1/\log ^{1-\gamma } N}\cdot \log ^{\gamma } N}= N^{1/\log ^{1-\gamma } N}/\log ^{\gamma } N\).

Recall that there are at most \(\log ^{\gamma }N\) sizes handled by AllocateMedium and therefore the time required to do the offline allocation is at most \(O\left( \log ^{\gamma }N\cdot M^3\right) \) which is equal to \(O(N^{3/\log ^{1-\gamma } N }/\log ^{2\gamma }N)=O(N)\). Therefore, the running time is O(N).    \(\square \)

6.2 Search Protocol of SE Scheme

Given a keyword w, the client first retrieves information \((s,\alpha ,\beta ,x)\) from \(\textsf {Tab}[w]\). Depending on the size s of \(\mathcal {D}(w)\) the client takes the following actions (see Fig. 10):

  • If the list \(\mathcal {D}(w)\) is small, the client reads two superbuckets \(\mathbf S \{\alpha ,s\}\) and \(\mathbf S \{\beta ,s\}\) and decrypts them. Since the size of the buckets \(\mathbf S [i]\) is \(\log ^{\gamma }N\) and each superbucket contains s of them, it follows that the read efficiency for small sizes is \(\varTheta (\log ^{\gamma } N)\). Also, since only two superbuckets are read, the locality for small lists is O(1).

  • If the list \(\mathcal {D}(w)\) is medium, the client reads two superbuckets \(\mathbf M \{\alpha ,s\}\) and \(\mathbf M \{\beta ,s\}\) and decrypts them. Also he performs an ORAM access in the stash \(\mathbf B _s\) for location x. Since the size of the buckets \(\mathbf M [i]\) is \(O(\log ^{\gamma } N)\) and each superbucket has s of them, the read efficiency for medium sizes due to accessing array \(\mathbf M \) is \(O(\log ^{\gamma } N)\). For the ORAM access, note that in our case it is \(n=c\cdot \log ^2 N\). Therefore, by Corollary 1, and since our block size is at least \(N^{1-1/\log \log N}\) which is \(\varOmega (\log ^{2/3} N)\), the bandwidth required is \(O(n^{1/3}\log ^2 n \cdot s)=O(\log ^{2/3} N\log ^2 \log N \cdot s)\) and therefore the read efficiency due to the ORAM access is \(O(\log ^{2/3} N\log ^2 \log N)=o(\log ^{\gamma } N)\), since \(\gamma =2/3+\delta \). Therefore, the overall read efficiency for medium sizes is \(O(\log ^{\gamma } N)\). Again, since only two superbuckets are read and the ORAM locality is O(1) (Corollary 1), it follows that the locality for medium lists is O(1).

  • Suppose now the list \(\mathcal {D}(w)\) is large such that \(\mathsf {min}<|\mathcal {D}(w)|\le \mathsf {max}\) where \(\mathsf {min}=N/\log ^{2-(h-1)\cdot \mathsf {step}}N\) and \(\mathsf {max}=N/\log ^{2-h\cdot \mathsf {step}}N\) for some \(h\in \{1,2,\ldots ,\frac{2-\gamma }{\mathsf {step}}\}\). To retrieve the list, our search algorithm performs our ORAM access on an array on \(N/\mathsf {max}\) blocks of size \(2\cdot \mathsf {max}\) each. By Corollary 1, we have that the worst-case bandwidth for this access is

    $$\begin{aligned} O\left( \left( \frac{N}{\mathsf {max}}\right) ^{1/3} \log ^2\left( \frac{N}{\mathsf {max}}\right) \mathsf {max}\right) = O\left( N \left( \log ^{2-h\cdot \mathsf {step}}N\right) ^{-2/3}\log ^2\log N\right) . \end{aligned}$$

    For read efficiency, note that the client must use this bandwidth to read a keyword list of size \(s\ge \mathsf {min}= N/\log ^{2-(h-1)\cdot \mathsf {step}}N\). Thus, the read efficiency is at most

    $$\begin{aligned} O\left( \log ^{2-(h-1)\cdot \mathsf {step}} N\cdot \left( \log ^{2-h\cdot \mathsf {step}}N\right) ^{-2/3}\log ^2\log N\right) =O(\log ^{\gamma '}N \log ^2 \log N)\,, \end{aligned}$$

    where for all \(h\ge 1\) it is \(\gamma '\le 2/3+2\cdot \mathsf {step}/3 < 2/3+\delta =\gamma \) since \(\mathsf {step}<3\delta /2\). Therefore, the above is \(o(\log ^{\gamma } N)\) as required.

  • For huge sizes, the read efficiency is at most \(O(\log ^{\gamma } N)\) and the locality is constant since the whole array \(\mathbf H \) is read.

Therefore, overall, the locality is O(1), the read efficiency is \(O(\log ^{\gamma } N)\) and the space required at the server is O(N).

Rounds of Interaction. Our protocol requires O(1) rounds for interaction for each query. In particular, for small and huge list sizes our construction requires a single round of interaction, as can be easily inferred from Fig. 10. For medium and large sizes, the deamortized version of our protocol which uses the deamortized ORAM from the extended version [13], requires four rounds of interaction.

Client Space. Finally, we measure the storage at the client (assuming, as discussed in Sect. 6.1 that \(\mathsf {Tab}\) is stored at the server). For small lists, it follows from our above analysis for read efficiency that the storage at the client is \(O(\log ^\gamma N\cdot s)\). Note that, from Corollary 1, for medium and large list sizes the necessary space at the client due to the ORAM protocol is \(O(n^{1/3} \log ^2 n \cdot s)\), where n is the number of ORAM indices and s is the result list size (this result uses the deamortized version of our ORAM from the extended version [13]). Since \(n \le \log ^2 N\), this becomes \(O(\log ^{2/3} N \log ^2 \log N \cdot s)\). Specifically for medium lists, the client also needs to download two superbuckets for total storage \(O(\log ^\gamma N\cdot s)\). For huge list sizes, recall that the client downloads the entire array \(\mathbf H \) which results in space O(N). However, note that in this case \(s > N/\log ^{\gamma }N\), therefore \(N < s \cdot \log ^{\gamma }N\) and the client storage can be written as \(O(\log ^{\gamma }N \cdot s)\). We stress that any searchable encryption scheme requires \(\varOmega (s)\) space at the client simply to download the result of a query. Thus, in all cases our scheme imposes just a multiplicative overhead for the client storage that is sub-logarithmic in the database size, compared to the minimum requirement. Moreover, we stress that this storage is transient, i.e., it is only necessary when issuing a query; between queries, the client requires O(1) space.

6.3 Security of Our Construction

We now prove the security of our construction. For this, we build a simulator \(\textsc {SimSetup}\) and \(\textsc {SimSearch}\) in Figs. 11 and 12 respectively.

Fig. 11.
figure 11

The simulator of the setup protocol of our SE scheme.

Simulation of the Setup Protocol. To simulate the setup protocol, our simulator must output \(\mathcal {I}_0\) by just using the leakage \(\mathcal {L}_1(D_0)=N\). Our \(\textsc {SimSetup}\) algorithm outputs \(\mathcal {I}_0\) as CPA-secure encryptions of arrays \((\mathbf S ,\mathbf M ,\mathbf H )\) that contain dummy values and have the same dimensions with the arrays of the actual setup algorithm. Also, it calls the ORAM simulator and also outputs \(\{\sigma _s,\mathsf {EM}_s\})\) and \(\{\sigma _h,\mathsf {EM}_h\})\). Due to the security of the underlying ORAM scheme and the CPA-security of the underlying encryption scheme, the adversary cannot distinguish between the two outputs.

One potential problem, however, is the fact that \(\textsc {SimSetup}\) always succeeds while there is a chance that the setup algorithm can fail, which will enable the adversary to distinguish between the two. However, by Lemma 6, this happens with probability \(\mathsf {neg}(N)=\mathsf {neg}(\kappa )\), as required by our security definition, Definition 1.

Simulation of the Search Protocol. The simulator of the \(\textsc {Search}\) protocol is shown in Fig. 12. For a keyword query \(w_k\), the simulator takes as input the leakage \(\mathcal {L}_2(w_k)=(s,b)\), as defined in Relation 1.

If the query on \(w_k\) was performed before (thus \(b\ne \bot \)), the simulator just outputs the previous messages \(M_b\) plus the messages that were output by the ORAM simulator.

If the query on \(w_k\) was not performed before, then the simulator generates the messages \(M_k\) depending on the size s of the list \(\mathcal {D}(w_k)\). In particular note that all accesses on \((\mathbf S ,\mathbf M ,\mathbf H ,\mathbf L _h)\) are independent of the dataset and therefore can be simulated by repeating the same process with the real execution.

Fig. 12.
figure 12

The simulator of the search protocol of our SE scheme.

7 Conclusions and Observations

Basing the Entire Scheme on ORAM. Our construction is using ORAM as a black box and therefore one could wonder why not use ORAM from the very beginning and on the whole dataset. While ORAM can provide much better security guarantees, it suffers from high read efficiency. E.g., to the best of our knowledge, there is no ORAM that we could use that yields sublogarithmic read efficiency (irrespective of the locality).

Avoiding the Lower Bound of [6]. We note that Proposition 4.6 by Asharov et al. [6] states that one could not expect to construct an allocation algorithm where the square of the locality \(\times \) the read efficiency is \(O(\log N/\log \log N)\). This is the case with our construction! The reason this proposition does not apply to our approach is because our allocation algorithm is using multiple structures for storage, e.g., stashes and multiple arrays, and therefore does not fall into the model used to prove the negative result.

Reducing the ORAM Read Efficiency. Our technique for building our ORAM in Sect. 4 relies on one hierarchical application of the method of square-root ORAM [18]. We believe this approach can be generalized to yield read efficiency \(O(n^{1/k}\log ^2 n\cdot \lambda )\) for general k. The necessary analysis, while tedious, seems technically non-challenging and we leave it for future work (e.g., we could revisit some ideas from [34]). Such an ORAM could also help us decrease the number of subranges on which we apply our \(\mathsf {AllocateLarge}\) algorithm.

Using Online Two-Choice Allocation. Our construction uses the offline variant of the two-choice allocation problem. This allows us to achieve low bounds on both the number of overflowing bins and the total overflow size in Sect. 3. However it requires executing a maximum flow algorithm during our construction’s setup. A natural question is whether we can use instead the (more efficient) online two-choice allocation problem. The best known result [9] for the online version yields a maximum load of \(O(\log \log n)\) beyond the expected value m / n, which suffices to bound the maximum number of overflowing bins with our technique. However, deriving a similar bound for the total overflow size would require entirely different techniques and we leave it as an open problem. Still, it seems that even if we could get the same bound for the overflow size as in the offline case, the read efficiency would be \(O(\log ^{\gamma } N \log \log N) \), as opposed to the better \(O(\log ^{\gamma } N)\), which is what we achieve here.

Reducing the Read Efficiency for Small Lists. The read efficiency of our scheme for small lists can be strictly improved if instead of using [6], we use the construction of Asharov et al. [7] that was proposed in concurrent work. In this manner, the read efficiency for a small keyword list with size \(N^{1 - \epsilon }\) would be \(\omega (1)\cdot \epsilon ^{-1} + O(\log \log \log N)\).