3PC ORAM with Low Latency, Low Bandwidth, and Fast Batch Retrieval

Jarecki, Stanislaw; Wei, Boyang

doi:10.1007/978-3-319-93387-0_19

Stanislaw Jarecki¹⁵ &
Boyang Wei¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10892))

Included in the following conference series:

International Conference on Applied Cryptography and Network Security

2059 Accesses
13 Citations

Abstract

Multi-Party Computation of Oblivious RAM (MPC ORAM) implements secret-shared random access memory in a way that protects access pattern privacy against a threshold of corruptions. MPC ORAM enables secure computation of any RAM program on large data held by different entities, e.g. MPC processing of database queries on a secret-shared database. MPC ORAM can be constructed by any (client-server) ORAM, but there is an efficiency gap between known MPC ORAM’s and ORAM’s. Current asymptotically best MPC ORAM is implied by an “MPC friendly” variant of Path-ORAM [26] called Circuit-ORAM, due to Wang et al [27]. However, using garbled circuit for Circuit-ORAM’s client implies MPC ORAM which matches Path-ORAM in rounds but increases bandwidth by \(\varOmega (\kappa )\) factor, while using GMW or BGW protocols implies MPC ORAM which matches Path-ORAM in bandwidth, but increases round complexity by \(\varOmega ({\log n}\log {\log n})\) factor, where \(\kappa \) is a security parameter and \(n\) is memory size.

In this paper we bridge the gap between MPC ORAM and client-server ORAM by showing a specialized 3PC ORAM protocol, i.e. MPC ORAM for 3 parties tolerating 1 fault, which uses only symmetric ciphers and asymptotically matches client-server Path-ORAM in round complexity and for large records also in bandwidth.

Our 3PC ORAM also allows for fast pipelined processing: With postponed clean-up it processes \(b\,{=}\,O({\log n})\) accesses in \(O(b\,{+}\,{\log n})\) rounds with \(O(D\,{+}\,\mathsf {poly}({\log n}))\) bandwidth per item, where \(D\) is record size.

The full version of this paper appears in Cryptology ePrint Archive [21].

You have full access to this open access chapter, Download conference paper PDF

DORAM Revisited: Maliciously Secure RAM-MPC with Logarithmic Overhead

3-Party Secure Computation for RAMs: Optimal and Concretely Efficient

Sub-logarithmic Distributed Oblivious RAM with Small Block Size

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

MPC ORAM. Multi-Party Computation Oblivious Random Access Memory (MPC ORAM), or Secure-Computation ORAM (SC ORAM), is a protocol which lets \(m\) parties implement access to a secret-shared memory in such a way that both memory records and the accessed locations remain hidden, and this security guarantee holds as long as no more than t out of \(m\) parties are corrupted. Applications of MPC ORAM stem from the fact that it can implement random memory access subprocedure within secure computation of any RAM program. Classic approaches to secure computation [3, 8, 17, 29] express computation as a Boolean or arithmetic circuit, thus their size, and consequently efficiency, is inherently lower-bounded by the size of their inputs. In practice this eliminates the possibility of secure computation involving large data, including such fundamental computing functionality as search and information retrieval. MPC ORAM makes such computation feasible because it generalizes secure computation from circuits to RAM programs: All RAM program instruction can be implemented using circuit-based MPC, since they involves only local variables, while access to (large) memory can be implemented with MPC ORAM.

As an application of MPC of RAM program, and hence of MPC ORAM, consider an MPC Database, i.e. an MPC implementation of processing of database queries over a secret-shared database. A typical database implementation would hash a searched keyword to determine an address of a hash table page whose content is then matched against the queried keyword. Standard MPC techniques can implement the hashing step, but the retrieval of the hash page is a random access to a large memory. Implementing this RAM access via garbled circuits requires \(\varOmega (nD\kappa )\) bandwidth, where \(n\) is the number of records, \(D\) is the record size, and \(\kappa \) is the cryptographic security parameter, which makes such computation unrealistic even for 1MB databases. By contrast, using MPC ORAM can cost \(O(\mathsf {poly}({\log n})D\kappa )\) and hence, in principle, can scale to large data.

Inefficiency Gap in MPC ORAM Constructions. The general applicability of MPC ORAM to MPC of any RAM program motivates searching for efficient MPC ORAM realizations. As pointed out in [10, 23], any ORAM with its client implemented with an MPC protocol yields MPC ORAM. This motivates searching for an ORAM with an MPC-friendly client, i.e. a client which can be efficiently computed using MPC techniques [16, 19, 22, 27, 28]. Indeed, the recent Circuit-ORAM proposal of Wang et al. [27] exhibits a variant of Path-ORAM of Stefanov et al. [26] whose client has a Boolean circuit of an asymptotically optimal size, i.e. a constant factor of the data which Path-ORAM client retrieves from the server, and which forms an input to its computation.

Still, in spite of the circuit-size optimality of Circuit-ORAM,^{Footnote 1} applying generic honest-but-curious MPC protocols to it yields MPC ORAM solutions which are two orders of magnitude more expensive than Path-ORAM:^{Footnote 2} Using Yao’s garbled circuit [29] on Circuit-ORAM yields a 2PC ORAM of [27] which has (asymptotically) the same round complexity as Path-ORAM, but its bandwidth, both online and in offline precomputation, is larger by \(\varOmega (\kappa )\) factor. Alternatively, applying GMW [17] or BGW [3] to the Boolean circuit for Circuit-ORAM yields 2PC or MPC ORAM which asymptotic preserves Path-ORAM bandwidth, but its round complexity is larger by \(\varOmega ({\log n}\log {\log n})\) factor (compare footnote 3).

Our Contribution: 3PC ORAM with Low Latency and Bandwidth We show that the gap between MPC ORAM and client-server ORAM can be bridged by exhibiting a 3PC ORAM, i.e. MPC for \(m\,{=}\,3\) servers with \(t\,{=}\,1\) fault, which uses customized, i.e. non-generic, 3PC protocols and asymptotically matches Path-ORAM in rounds, and, for records size \(D\,=\,\varOmega (\kappa {{\log ^2}n})\), bandwidth. Specifically, our 3PC ORAM securely emulates the Circuit-ORAM client in 3PC setting, using \(O({\log n})\) rounds and \(O(\kappa {{\log ^3}n}+D{\log n})\) bandwidth (see Fig. 1). We note that the 3PC setting of \((t,m)\) = (1, 3) gives weaker security than 2PC setting of \((t,m)\) = (1, 2), but it was shown to enable lower-cost solutions to many secure computation problems compared to both 2PC or general \((t,m)\)-MPC (e.g. [1, 5]) and for that reason it’s often chosen in secure computation implementations (e.g. [4, 6]). Here we show that 3PC benefits extend to MPC ORAM.

We show the benefits of our 3PC ORAM contrasted with previous 2PC and 3PC ORAM approaches in Fig. 1. In the 3PC setting we include a generic 3PC Circuit-ORAM, which results from implementing Circuit-ORAM with the generic 3PC protocol of Araki et al. [1], which is the most efficient 3PC instantiation we know of either the BGW or the GMW framework.^{Footnote 3} The second 3PC ORAM we compare to is Faber et al. [14], which uses non-generic 3PC techniques, like we do, but it emulates in 3PC with a less efficient Binary-Tree ORAM variant than Circuit-ORAM, yielding 3PC ORAM with bandwidth worse than ours by \(\varOmega (\lambda )\) factor. Regarding 2PC ORAM, several 2PC ORAM’s based on Binary-Tree ORAM variants were given prior to Circuit-ORAM [16, 19, 22, 28], but we omit them from Fig. 1 because Circuit-ORAM outperforms them [27]. We include two recent alternative approaches, 2PC ORAM of [30] based on Square-Root ORAM of [18], and 2PC FLORAM of [12] based on the Distributed Point Function (DPF) of [20]. However, both of these 2PC ORAM’s use \(O(\sqrt{n})\) bandwidth, and [12] also uses \(O(n)\) local computation, which makes them not scale well for large \(n\)’s.^{Footnote 4} Restricting the comparison to poly(\({\log n}\)) MPC ORAM, our 3PC ORAM offers the following trade-offs:

(1) Compared to the generic 3PC Circuit-ORAM [1] applied to Circuit-ORAM, we increase bandwidth from \(O({{\log ^3}n}\,{+}\,D{\log n})\) to \(O(\kappa {{\log ^3}n}\,{+}\,D{\log n})\) but reduce round complexity from \(O({{\log ^2}n}\log {\log n})\) to \(O({\log n})\);

(2) Compared to the generic garbled circuit 2PC [29] applied to Circuit-ORAM, we weaken the security model, from \((t,m)=(1,2)\) to \((t,m)=(1,3)\), but reduce bandwidth from \(O(\kappa {{\log ^3}n}\,{+}\,\kappa D{\log n})\) to \(O(\kappa {{\log ^3}n}\,{+}\,D{\log n})\).

Thus for medium-sized records, \(D=\varOmega (\kappa {{\log ^2}n})\), our 3PC ORAM asymptotically matches client-server Path-ORAM in all aspects, and beats 2PC Circuit-ORAM by \(\varOmega (\kappa )\) factor in bandwidth, without dramatic increase in round complexity incurred using generic 3PC techniques. In concrete terms, our round complexity is 50x lower than the generic 3PC Circuit-ORAM,^{Footnote 5} and, for \(D\,{>}\,1\) KB, our bandwidth is also \({>}50\)x lower than 2PC Circuit-ORAM. Our protocol is also competitive for small record sizes, e.g. \(D=4B\): First, our bandwidth is only about 2x larger than the generic 3PC Circuit-ORAM; Second, our bandwidth is lower than the 2PC Circuit-ORAM by a factor between 10x and 20x for \(20\,{\le }\,{\log n}\,{\le }\,30\).

Fast System Response and Batch Retrieval. Another benefit of our 3PC ORAM is a fast system response, i.e. the time we call a Retrieval Phase, from an access request to the retrieval of the record. In fact, our protocol supports fast retrieval of a batch of requests, because the expensive post-processing of each access (i.e. the Circuit-ORAM eviction procedure) can be postponed for a batch of requests, allowing all of them to be processed at a smaller cost. Low-bandwidth batch retrieval with postponed eviction was recently shown for client-server Path-ORAM variants [11, 24] (see also [15]), and our protocol allows MPC ORAM to match this property in the 3PC setting.

Specifically, our protocol processes \(b\,{=}\,O({\log n})\) requests in \(3b\,{+}\,3h\) rounds, using \(3D\,{+}\,O({{\log ^2}n}\log {\log n})\) bandwidth per record, and to the best of our knowledge no other MPC ORAM allows batch-processing with such costs. After retrieving \(b\) requests the protocol must perform all evictions, using \(6b\) rounds and \(O(b(\kappa {{\log ^3}n}+D{\log n}))\) total bandwidth, but this can be postponed for any batch size that benefits the higher-level MPC application. Concretely, for \({\log n}\,{\le }\,30\), the per-record bandwidth for \(b\,{\le }\,4{\log n}\) is only \({\le }\,3D\,{+}\,10\) KB.

Brief Overview of our 3PC ORAM. We sketch the main ideas behind our 3PC protocol that emulates Circuit-ORAM ORAM. Observe that Circuit-ORAM client, like a client in any Binary-Tree ORAM variant, performs the following steps: (1) locate the searched record in the retrieved tree path, (2) post-process that record (free-up its current location, update its labels, and add the modified record to the path root), (3) determine the eviction map, i.e. the permutation on positions in the retrieved path according to which the records will be moved in eviction, and (4) move the records on the path according to the eviction map. The main design principle in our 3PC emulation of Circuit-ORAM is to implement steps (1), (2), and (4) using customized asymptotically bandwidth-optimal and constant-round protocols (we explain some of the challenges involved in Sect. 2), and leave the eviction map computation step as in 2PC Circuit-ORAM, implemented with generic constant-round secure computation, namely garbled circuits. Circuit-ORAM computes the eviction map via data-dependent scans, which we do not know how to implement in constant rounds without the garbled circuit overhead. However, computation of the eviction map involves only on metadata, and is independent of record payloads. Hence even though using garbled circuits in this step takes \(O(\kappa )\) bandwidth per input bit, this is upper-bounded by the cost of bandwidth-optimal realization of the data movement step (4) already for \(D\,{\approx }\,140\)B.

Secondly, we utilize the 3PC setting in the retrieval phase, to keep its bandwidth especially low, namely \(O(D\,{+}\,{{\log ^2}n}\log {\log n})\). The key ingredient is a 3-party Secret-Shared PIR (SS-PIR) gadget, which computes a secret-sharing of record \(\mathsf {M}[\mathrm {N}]\) given a secret-sharing of array \(\mathsf {M}\) and of address \(\mathrm {N}\). We construct SS-PIR from any 2-server PIR [13] whose servers’ responses form an xor-sharing of the retrieved record, which is the case for many 2-PIR schemes [2, 9, 20]. Another component is a one-round bandwidth-optimal compiler from 3PC SS-PIR to 3PC Keyword SS-PIR, which retrieves shared value given a sharing of keyword and of (keyword,value) list. With a careful design we use only three rounds for the retrieval and post-processing steps, which allows pipelined processing of a batch of accesses using only three rounds per tree.

Roadmap. We overview the technical challenges of our construction in Sect. 2. We present our 3PC ORAM protocol in Sect. 3, argue its security in Sect. 4, and discuss our prototype performance in Sect. 5. For lack of space, all specialized sub-protocols our protocol requires are deferred to [21], Appendix A. The full security argument, the specification of garbled circuits we use, and further prototype performance data, are all included in [21], Appendices B-E.

2 Technical Overview

Overview of Path ORAM [26]. Our 3PC Circuit-ORAM is a 3PC secure computation of Circuit-ORAM of [27] (see footnote 1), which is a variant of Path-ORAM of Shi et al. [26]. We thus start by recalling Path-ORAM of [26], casting it in terms which are convenient in our context. Let \(\mathsf {M}\) be an array of \(n\) records of size \(D\) each. Server \(\mathsf {S}\) keeps a binary tree of depth \({\log n}\), denoted \(\mathsf {tree}\), shown in Fig. 2, where each node is a bucket of a small constant size \(w\), except the root bucket (a.k.a. a stash) which has size \(s\,{=}\,O({\log n})\). Each tree bucket is a list of tuples, which are records with four fields, \(\mathsf {fb}\), \(\mathsf {lb}\), \(\mathsf {adr}\), and \(\mathsf {data}\). For each address \(\mathrm {N}\in \{0{,}1\}^{\log n}\), record \(\mathsf {M}[\mathrm {N}]\) is stored in a unique tuple \(\mathsf {T}\) in \(\mathsf {tree}\) s.t. \(\mathsf {T}.(\mathsf {fb},\mathsf {lb},\mathsf {adr},\mathsf {data})=(1,\mathrm {L},\mathrm {N},\mathsf {M}[\mathrm {N}])\) where \(\mathsf {fb}\) is a full/empty tuple status bit and \(\mathrm {L}\) is a label which defines a tree leaf assigned at random to address \(\mathrm {N}\).

Data-structure \(\mathsf {tree}\) satisfies an invariant that a tuple with label \(\mathrm {L}\) lies in a bucket on the path from the root to leaf \(\mathrm {L}\), denoted \(\mathsf {tree}.\mathsf {path}(\mathrm {L})\). To access address \(\mathrm {N}\), client \(\mathsf {C}\) uses a (recursive) position map \(\mathsf {PM}\,{:}\,\mathrm {N}\,{\rightarrow }\,\mathrm {L}\) (see below) to find leaf \(\mathrm {L}\) corresponding to \(\mathrm {N}\), sends \(\mathrm {L}\) to \(\mathsf {S}\) to retrieve \(\mathsf {path}=\mathsf {tree}.\mathsf {path}(\mathrm {L})\), searches \(\mathsf {path}\) for \(\mathsf {T}\,{=}\,(1,\mathrm {L},\mathrm {N},\mathsf {M}[\mathrm {N}])\) with fields \((\mathsf {fb},\mathsf {adr})\) matching \((1,\mathrm {N})\), assigns new random leaf \(\mathrm {L}'\) to \(\mathrm {N}\), adds a modified tuple \(\mathsf {T}' = (1,\mathrm {L}',\mathrm {N},\mathsf {M}[\mathrm {N}])\) to the root bucket in \(\mathsf {path}\) (In case of write access \(\mathsf {C}\) also replaces \(\mathsf {M}[\mathrm {N}]\) in \(\mathsf {T}'\) with a new entry), and erase old \(\mathsf {T}\) from \(\mathsf {path}\) by flipping \(\mathsf {T}.\mathsf {fb}\) to 0. Finally, to avoid overflow, \(\mathsf {C}\) evicts tuples in \(\mathsf {path}\) as far down as possible without breaking the invariant or overflowing any bucket.

Position map \(\mathsf {PM}\,{:}\,\mathrm {N}\,{\rightarrow }\,\mathrm {L}\) is stored using the same data-structure, with each tuple storing labels corresponding to a batch of \(2^{\tau }\) consecutive addresses, for some constant \(\tau \). Since such position map has only \(2^{{\log n}}/2^{\tau }=2^{{\log n}-\tau }\) entries, this recursion results in \(h=({\log n}/\tau )\,{+}\,1\) trees \(\mathsf {tree}_0,,..,\mathsf {tree}_{h-1}\) which work as follows (see Fig. 3): Divide \(\mathrm {N}\) into \(\tau \)-bit blocks \(\mathrm {N}_1,...,\mathrm {N}_{h-1}\). The top-level tree, \(\mathsf {tree}_{h{-}1}\) contains the records of \(\mathsf {M}\) as described above, shown in Fig. 2, while for \(i<h{-}1\), \(\mathsf {tree}_i\) is a binary tree of depth \(d_i=i\tau \) which implements position map \(\mathsf {PM}_i\) that matches address prefix \(\mathrm {N}_{[1,...,i{+}1]}\,{=}\,\mathrm {N}_1|...|\mathrm {N}_{i{+}1}\) to leaf \(\mathrm {L}_{i{+}1}\) assigned to this prefix in \(\mathsf {tree}_{i{+}1}\). Access algorithm ORAM.Access traverses this data-structure by sequentially retrieving the labels assigned to each prefix of the searched-for address, using an algorithm we denote ORAM.ML For i from 0 to \(h{-}1\), algorithm \(\mathsf{\small {ORAM}}.\mathsf{\small {ML}}\) retrieves \(\mathrm {L}_{i{+}1}=\mathsf {PM}_i(\mathrm {N}_1|...|\mathrm {N}_{i{+}1})\) from \(\mathsf {tree}_i\) using the following steps: (1) it identifies path \(\mathsf {path}=\mathsf {tree}_i.\mathsf {path}(\mathrm {L}_i)\) in \(\mathsf {tree}_i\) using label \(\mathrm {L}_i\), (2) it identifies tuple \(\mathsf {T}\) in \(\mathsf {path}\) s.t. \(\mathsf {T}.\mathsf {adr}=\mathrm {N}_1|...|\mathrm {N}_i\), and (3) it returns \(\mathrm {L}_{i{+}1}=\mathsf {T}.\mathsf {data}[\mathrm {N}_{i{+}1}]\).

Circuit-ORAM vs. Path-ORAM. Circuit-ORAM (see footnote 1) follows the same algorithm as Path-ORAM except (1) the eviction procedure is restricted in that it moves only selected tuples down the path in \(\mathsf {path}\), as we discuss further below; and (2) it performs the eviction on two paths in each tree per access. Our 3PC emulation of Circuit-ORAM also runs twice per each tree per access, but since the second execution is limited to eviction, for simplicity of presentation we omit it in all discussion below, except when we report performance data.

Top-Level Design of 3PC Circuit-ORAM. The client algorithm in all variants of Binary-Tree ORAM, which includes Path-ORAM and Circuit-ORAM, consists of the following phases:

1.
Retrieval, which given \(\mathsf {path}=\mathsf {tree}.\mathsf {path}(\mathrm {L})\) and address prefix \(\mathrm {N}\), locates tuple \(\mathsf {T}=(1,\mathrm {L},\mathrm {N},\mathsf {data})\) in \(\mathsf {path}\) and retrieves next-level label (or record) in \(\mathsf {data}\);
2.
Post-Process, which removes \(\mathsf {T}\) from \(\mathsf {path}\), injects new labels into \(\mathsf {T}\), and re-inserts it in the root (=stash);
3.
Eviction, which can be divided into two sub-steps:
1. (a)
  Eviction Logic: An eviction map \(\mathsf{\small {EM}}\) is computed, by function denoted Route, on input label \(\mathrm {L}\) and the metadata fields \((\mathsf {fb},\mathsf {lb})\) of tuples in \(\mathsf {path}\),
2. (b)
  Data Movement: Permute tuples in \(\mathsf {path}\) according to map \(\mathsf{\small {EM}}\).

Our 3PC ORAM is a secure emulation of the above procedure, with the Eviction Logic function Route instantiated as in Circuit-ORAM, and it performs all the above steps on the sharings of inputs \(\mathsf {tree}\) and \(\mathrm {N}\), given label \(\mathrm {L}\) as a public input known to all parties. With the exception of the next-level label recovered in Retrieval, all other variables remain secret-shared. Our implementation of the above steps resembles the 3PC ORAM emulation of Binary-Tree ORAM by [14] in that we use garbled circuit for Eviction Logic, and specialized 3PC protocols for Retrieval, Post-Process, and Data Movement. However, our implementations are different from [14]: First, to enable low-bandwidth batch processing of retrieval we use different sharings and protocols in Retrieval and Post-Process. Second, to securely “glue” Eviction Logic and Data Movement we need to mask mapping \(\mathsf{\small {EM}}\) computed by Eviction Logic and implement Data Movement given this masked mapping. We explain both points in more detail below.

Low-Bandwidth 3PC Retrieval. The Retrieval phase realizes a Keyword Secret-Shared PIR (Kw-SS-PIR) functionality: The parties hold a sharing of an array of (keyword, value) pairs, and a sharing of a searched-for keyword, and the protocol must output a sharing of the value in the (keyword, value) pair that contains the matching keyword. In our case the address prefix \(\mathrm {N}_{[1,i]}\) is the searched-for keyword and \(\mathsf {path}\) is the array of the (keyword, value) pairs where keywords are address fields \(\mathsf {adr}\) and values are payload fields \(\mathsf {data}\).

The 3PC implementation of Retrieval in [14] has \(O(\ell D)\) bandwidth where \(\ell \,{=}\,O({\log n})\) is the number of tuples in \(\mathsf {path}\), and here we reduce it to \(3D\,{+}\,O(\ell \log \ell )\) as follows: First, we re-use the Keyword Search protocol KSearch of [14] to create a secret-sharing of index j of a location of the keyword-matching tuple in \(\mathsf {path}\). This subprotocol reduces the problem to finding an index where a secret-shared array of length \(\ell \) contains an all-zero string, which has \(\varTheta (\ell \log \ell )\) communication complexity. Our KSearch implementation has \(2\ell (c+\log \ell )\) bandwidth where \(2^{-c}\) is the probability of having to re-run KSearch because of collisions in \(\ell \) pairs of \((c+\log \ell )\)-bit hash values. The overall bandwidth is optimal for \(c\,{\approx }\,\log \log \ell \), but we report performance numbers for \(c\,{=}\,20\).

Secondly, we use a Secret-Shared PIR (SS-PIR) protocol, which creates a fresh sharing of the j-th record given the shared array and the shared index j. We implement SS-PIR in two rounds from any 2-server PIR [13] whose servers’ PIR responses form an xor-sharing of the retrieved record. Many 2-PIR’s have this property, e.g. [2, 9, 20], but we exemplify this generic construction with the simplest form of 2-server PIR of Chor et al. [9] which has \(3\ell +3D\) bandwidth. This is not optimal in \(\ell \), but in our case \(\ell \,{\le }\,150\,{+}\,b\) where \(b\) is the number of accesses with postponed eviction, the optimized version of SS-PIR sends only \({\approx }\ell {+}3D\) bits on-line, and KSearch already sends \(O(\ell \log \ell )\) bits. Our generic 2-PIR to 3PC-SS-PIR compiler is simple (a variant of it appeared in [20]) but the 3-round 3PC Kw-SS-PIR protocol is to the best of our knowledge novel.

Efficient 3PC Circuit-ORAM Eviction. In Eviction we use a simple Data Movement protocol, with 2 round and \(\approx 2|\mathsf {path}|\) bandwidth. With three parties denoted as \((\mathsf {C},\mathsf {D},\mathsf {E})\), our protocol creates a two-party \((\mathsf {C},\mathsf {E})\)-sharing of \(\mathsf {path}'=\mathsf{\small {EM}}(\mathsf {path})\) from a \((\mathsf {C},\mathsf {E})\)-sharing of \(\mathsf {path}\) if party \(\mathsf {D}\) holds eviction map \(\mathsf{\small {EM}}\) in the clear. Naively outputting to party \(\mathsf {D}\) is insecure, as eviction map is correlated with the ORAM access pattern, so the question is whether \(\mathsf{\small {EM}}\) can be masked by some randomizing permutation known by \(\mathsf {C}\) and \(\mathsf {E}\). [14] had an easy solution for its binary tree ORAM variant because its algorithm Route outputs a regular \(\mathsf{\small {EM}}\), that buckets on every except the last level of the retrieved \(\mathsf {path}\) always move two tuples down to the next level, so all [14] needed to do is to randomly permute tuples on each bucket level of \(\mathsf {path}\), and the resulting new \(\mathsf{\small {EM}}'\) on the permuted \(\mathsf {path}\) leaks no information on \(\mathsf{\small {EM}}\). By contrast, Circuit-ORAM eviction map is non-regular (see Fig. 4): Its bucket level map \(\varPhi \) of \(\mathsf{\small {EM}}\) can move a tuple by variable distance and can leave some buckets untouched, both of which are correlated with the density of tuples in \(\mathsf {path}\), and thus with ORAM access pattern.

Thus our goal is to transform the underlying Circuit-ORAM eviction map \(\mathsf{\small {EM}}= (\varPhi ,\mathsf {t})\) into a map whose distribution does not depend on the data (\(\varPhi \) describes the bucket-level movement, while \(\mathsf {t}\) is an array containing one tuple index from each bucket that will be moved). We do so in two steps. First, we add an extra empty tuple to each bucket and we modify Circuit-ORAM algorithm Route to expand function \(\varPhi :\mathsf {Z}_{d}{\rightarrow }\mathsf {Z}_{d}\cup \{\perp \}\) into a cyclic permutation \(\sigma \) on \(\mathsf {Z}_{d}\) (\(d\) is the depth of \(\mathsf {path}\), \(\mathsf {Z}_{d}\) is the set \(\{0,...,d-1\}\)), by adding spurious edges to \(\varPhi \) in the deterministic way illustrated in Fig. 4. Second, we apply two types of masks to the resulting output \((\sigma ,\mathsf {t})\) of Route, namely a random permutation \(\pi \) on \(\mathsf {Z}_{d}\) and two arrays \((\delta ,\rho )\), each of which contains a random tuple index in each bucket. Our Eviction Logic protocol will use \((\pi ,\delta ,\rho )\) to mask \((\sigma ,\mathsf {t})\) by computing \((\sigma ^{\circ },\mathsf {t}^{\circ })\) s.t. \(\sigma ^{\circ }\,{=}\,\pi \cdot \sigma \cdot \pi ^{-1}\) (permutation composition) and \(\mathsf {t}^{\circ }\,{=}\,\rho \oplus \pi (\mathsf {t}\oplus \delta )\). And now we have a masked eviction map \(\mathsf{\small {EM}}_{\sigma ^{\circ },\mathsf {t}^{\circ }}\) that can be revealed to party \(\mathsf {D}\) but does not leak information on \(\mathsf{\small {EM}}_{\sigma ,\mathsf {t}}\) or \(\mathsf{\small {EM}}_{\varPhi ,\mathsf {t}}\).

3 Our Protocol: 3PC Emulation of Circuit-ORAM

Protocol Parties. We use \(\mathsf {C},\mathsf {D},\mathsf {E}\) to denote the three parties participating in 3PC-ORAM. We use \(x^{\mathsf {P}}\) to denote that variable x is known only to party \(\mathsf {P}\in \{\mathsf {C},\mathsf {D},\mathsf {E}\}\), \(x^{\mathsf {P}_1\mathsf {P}_2}\) if x is known to \(\mathsf {P}_1\) and \(\mathsf {P}_2\), and x if known to all parties.

Shared Variables, Bitstrings, Secret-Sharing. Each pair of parties \(\mathsf {P}_1,\mathsf {P}_2\) in our protocol is initialized with a shared seed to a Pseudorandom Generator (PRG), which allows them to generate any number of shared (pseudo)random objects. We write if \(\mathsf {P}_1\) and \(\mathsf {P}_2\) both sample x uniformly from set \(\mathsf {S}\) using the PRG on a jointly held seed. We use several forms of secret-sharing, and here introduce four of them which are used in our high level protocols 3PC-ORAM.Access and 3PC-ORAM.ML (Algorithms 1 and 2):

Integer Ranges, Permutations. We define \(\mathsf {Z}_{n}\) as set \(\{0,...,n{-}1\}\), and \(\mathsf {perm}_{n}\) as the set of permutations on \(\mathsf {Z}_{n}\). If \(\pi ,\sigma \in \mathsf {perm}_{n}\) then \(\pi ^{-1}\) is an inverse permutation of \(\pi \), and \(\pi \cdot \sigma \) is a composition of \(\sigma \) and \(\pi \), i.e. \((\pi \cdot \sigma )(i)=\pi (\sigma (i))\).

Arrays. We use \(\mathsf {array}^{m}[\ell ]\) to denote arrays of \(\ell \) bitstrings of size m, and we write \(\mathsf {array}[\ell ]\) if m is implicit. We use x[i] to denote the i-th item in array x. Note that \(x\in \mathsf {array}^{m}[\ell ]\) can also be viewed as a bitstring in \(\{0{,}1\}^{\ell m}\).

Permutations, Arrays, Array Operations. Permutation \(\sigma \in \mathsf {perm}_{\ell }\) can be viewed as an array \(x\in \mathsf {array}^{\log \ell }[\ell ]\), i.e. \(x=[\sigma (0),...,\sigma (\ell {-}1)]\). For \(\pi \in \mathsf {perm}_{\ell }\) and \(y\in \mathsf {array}[\ell ]\) we use \(\pi (y)\) to denote an array containing elements of y permuted according to \(\pi \), i.e. \(\pi (y)=[y_{\pi ^{-1}(0)},...,y_{\pi ^{-1}(\ell -1)}]\).

Garbled Circuit Wire Keys. If variable \(x\in \{0{,}1\}^m\) is an input/output in circuit C, and \(\mathsf {wk}\in \mathsf {array}^{\kappa }[m,2]\) is the set of wire key pairs corresponding to this variable in the garbled version of C, then \({}\{\mathsf {wk}\,{:}\,x\}\in \mathsf {array}^{\kappa }[m]\) denotes the wire-key representation of value x on these wires, i.e. \({}\{\mathsf {wk}\,{:}\,x\}=\{\mathsf {wk}[x[i]]\}_{i=1}^m\). If the set of keys is implicit we will denote \({}\{\mathsf {wk}\,{:}\,x\}\) as \(\overline{x}\).

3PC ORAM Protocol. Our 3PC ORAM protocol, 3PC-ORAM.Access, Algorithm 1, performs the same recursive scan through data-structure \(\mathsf {tree}_0,...,\mathsf {tree}_{h{-}1}\) as the client-server Path-ORAM (and Circuit-ORAM) described in Sect. 2, except it runs on inputs in secret-sharing format. The main loop of 3PC-ORAM.Access, i.e. protocol 3PC-ORAM.ML, Algorithm 2, also follows the corresponding client-server algorithm ORAM.ML, except that apart of the current-level leaf label \(\mathrm {L}\) which is known to all parties, all its other inputs are secret-shared as well.

Protocol 3PC-ORAM.ML calls subprotocols whose round/bandwidth specifications are stated in Fig. 5. (We omit computation costs because they are all comparable to link-encryption of communicated data). The low costs of these subprotocols are enabled by different forms of secret-sharings, e.g. xor versus additive, or 2-party versus 3-party, and by low-cost (or no cost) conversions between them. For implementations of these protocols we refer to [21] Appendix A.

Three Phases of 3PC-ORAM.ML: Protocol 3PC-ORAM.ML computes on sharing for \(\mathsf {path}=\mathsf {tree}.\mathsf {path}(\mathrm {L})\) and it contains the same three phases as the client-server Path-ORAM, but implemented with specialized 3PC protocols:

(1) Retrieval runs protocol KSearch to compute “shift” (i.e. additive) sharing of index for tuple \(\mathsf {T}\,{=}\,\mathsf {path}[j]\) in \(\mathsf {path}\) s.t. \(\mathsf {path}[j].\mathsf {adr}\,{=}\,\mathrm {N}\) and \(\mathsf {path}[j].\mathsf {fb}\,{=}\,1\), i.e. it is the unique (and non-empty) tuple pertaining to address prefix \(\mathrm {N}\); Then it runs protocol 3ShiftPIR to extract sharing of the payload \(X=\mathsf {path}[j].\mathsf {data}\) of this tuple, given sharings and ; In parallel to \(\mathsf{\small {3ShiftPIR}}\) it also runs protocol 3ShiftXorPIR to publicly reconstruct the next-level label stored at position \(\varDelta {\mathrm {N}}\) in this tuple’s payload, i.e. \(\mathrm {L}_{i+1}\,{=}\,(\mathsf {path}[j].\mathsf {data})[\varDelta {\mathrm {N}}]\), given sharing and . This construction of the Retrieval emulation allows for presenting protocols 3ShiftPIR and 3ShiftXorPIR (see resp. Algorithm 9 and 11 in [21], Appendix A) as generic SS-PIR constructions from a class of 2-Server PIR protocols. However, a small modification of this design achieves better round and on-line bandwidth parameters, see an Optimizations and Efficiency Discussion paragraph below.

(2) Post-process runs the Update-Label-in-Tuple protocol ULiT to form sharing of a new tuple using sharing of the retrieved tuple’s payload, sharings and of the address prefix and the next address chunk, and sharings of new labels; In parallel to \(\mathsf{\small {ULiT}}\) it also runs protocol FlipFlag to flip the full/empty flag to 0 in the old version of this tuple in \(\mathsf {path}\), which executes on inputs the sharings of field \(\mathsf {fb}\) of tuples in \(\mathsf {path}\) and on the “shift” sharing ; Once ULiT terminates the parties can insert into sharing of the root bucket in \(\mathsf {path}\). At this point the root bucket has size \(s{+}1\) (or \(s{+}b\) if we postpone eviction for a batch of \(b\) accesses).

(3) Eviction emulates Circuit-ORAM eviction on sharing involved in retrieval (or another path because 3PC-ORAM.Access, just like client-server Circuit-ORAM, performs eviction on two paths per access). It uses the generic garbled circuit protocol GC(Route) to compute the Circuit-ORAM eviction map (appropriately masked), and then runs protocols PermBuckets, PermTuples, and XOT to apply this (masked) eviction map to the secret-shared . We discuss the eviction steps in more details below.

Eviction Procedure. As we explain in Sect. 2, we split Eviction into Eviction Logic, which uses garbled circuit subprotocol to compute the eviction map \(\mathsf{\small {EM}}\), and Eviction Movement, which uses special-purpose protocols to apply \(\mathsf{\small {EM}}\) to the shared path, which in protocol 3PC-ORAM.ML will be . However, recall that revealing the eviction map to any party would leak information about path density, and consequently the access pattern. We avoid this leakage in two steps: First, we modify the Circuit-ORAM eviction logic computation Route, so that when it computes bucket-level map \(\varPhi \) and the tuple pointers array \(\mathsf {t}\), which define an eviction map \(\mathsf{\small {EM}}_{\varPhi ,\mathsf {t}}\), the algorithm scans through the buckets once more to expand the partial map \(\varPhi \) into a complete cycle \(\sigma \) over the \(d\) buckets (see Fig. 4). (We include the modified Circuit-ORAM algorithm Route in [21], Appendix D.) Second, the garbled circuit computation \(\mathsf{\small {GC}}(\mathsf{\small {Route}})\), see Step 6, Algorithm 2, does not output \((\sigma ,\mathsf {t})\) to \(\mathsf {D}\) in the clear: Instead, it outputs \(\mathsf {t}'\,{=}\,\mathsf {t}\oplus \delta \) where \(\delta \) is a random array, used here as a one-time pad, and the garbled wire encoding of the bits of \(\sigma \,{=}\,[\sigma (1),...,\sigma (d)]\), i.e. the output wire keys \({}\{\mathsf {wk}\,{:}\,\sigma \}\,{=}\,\mathsf {wk}[i][\sigma [i]]\}_{i=1}^{d\log d}\).

Recall that we want \(\mathsf {D}\) to compute \((\sigma ^{\circ },\mathsf {t}^{\circ })\), a masked version of \((\sigma ,\mathsf {t})\), where \(\sigma ^{\circ }\,{=}\,\pi \cdot \sigma \cdot \pi ^{-1}\) and \(\mathsf {t}^{\circ }\,{=}\,\rho \oplus \pi (\mathsf {t}\oplus \delta )\), for \(\pi \) a random permutation on \(\mathsf {Z}_{d}\) and \(\delta ,\rho \) random arrays, all picked by \(\mathsf {C}\) and \(\mathsf {E}\). This is done by protocol PermBuckets, which takes 2 on-line rounds to let \(\mathsf {D}\) translate \({}\{\mathsf {wk}\,{:}\,\sigma \}\) into \(\sigma ^{\circ }\,{=}\,\pi \cdot \sigma \cdot \pi ^{-1}\) given \(\mathsf {wk}\) held by \(\mathsf {E}\) and \(\pi \) held by \(\mathsf {C},\mathsf {E}\), and (in parallel) PermTuples, which takes 2 rounds to let \(\mathsf {D}\) translate \(\mathsf {t}'\,{=}\,\mathsf {t}\oplus \delta \) into \(\mathsf {t}^{\circ }\,{=}\,\rho \oplus \pi (\mathsf {t}')\) given \(\pi ,\rho \) held by \(\mathsf {C},\mathsf {E}\). Then \(\mathsf {C},\mathsf {E}\) permute (implied by , because ) by where \(\ddot{\pi }\), \(\tilde{\delta }\), and \(\tilde{\rho }\) are permutations on \(\ell =d\cdot (w{+}1)\) tuples in the path induced by \(\pi ,\delta ,\rho \):

\(\pi \in \mathsf {perm}_{d}\) defines \(\ddot{\pi }\in \mathsf {perm}_{\ell }\) s.t. \(\ddot{\pi }(j,t)=(\pi (j),t)\), i.e. \(\ddot{\pi }\) moves position t in bucket j to position t in bucket \(\pi (j)\);
\(\delta \in \mathsf {array}^{\log {(w{+}1)}}[d]\) defines \(\tilde{\delta }\in \mathsf {perm}_{\ell }\) s.t. \(\tilde{\delta }(j,t)=(j,t\oplus \delta )\), i.e. \(\tilde{\delta }\) moves position t in bucket j to position \(t\oplus \delta [j]\) in bucket j; same for \(\rho \) and \(\tilde{\rho }\);

Now use protocol XOT in 2 round and \({\approx }\,2|\mathsf {path}|\) bandwidth to apply map \(\mathsf{\small {EM}}_{\sigma ^{\circ },\mathsf {t}^{\circ }}\) held by \(\mathsf {D}\) to . The result is for , and when \(\mathsf {C},\mathsf {E}\) apply \(\varPi ^{-1}\) to it they get for . Finally can be reconstructed from in 1 round and \(2|\mathsf {path}|\) bandwidth (see [21], Appendix A for secret-sharing conversions and reasoning), and can then be injected into .

Eviction Correctness. We claim that the eviction protocol described above implements mapping \(\mathsf{\small {EM}}_{\sigma ,\mathsf {t}}\) applied to \(\mathsf {path}\), i.e. that (note that \((\tilde{x})^{-1}=\tilde{x}\)):

(1)

Consider the set of points \(S=\{(j,\mathsf {t}[j])\, |\, j\in \mathsf {Z}_{d}\}\) which are moved by the left hand side (LHS) permutation \(\mathsf{\small {EM}}_{\sigma ,\mathsf {t}}\). To argue that Eq. (1) holds we first show that the RHS permutation maps any point \((j,\mathsf {t}[j])\) of S in the same way as the LHS permutation:

It remains to argue that RHS is an identity on points not in S, just like LHS. Observe that set \(S'\) of tuples moved by \(\mathsf{\small {EM}}_{\sigma ^{\circ },\mathsf {t}^{\circ }}\) consists of the following tuples:

Also note that:

which means that \(S'\,{=}\,\varPi (S)\), so if \((j,t)\,{\not \in }\,S\) then \(\varPi (j,t)\,{\not \in }\,S'\), hence \((\mathsf{\small {EM}}_{\sigma ^{\circ },\mathsf {t}^{\circ }}\cdot \varPi )(j,t)\,{=}\,\varPi (j,t)\), and hence \(\varPi ^{-1}\cdot \mathsf{\small {EM}}_{\sigma ^{\circ },\mathsf {t}^{\circ }}\cdot \varPi \) and \(\mathsf{\small {EM}}_{\sigma ,\mathsf {t}}\) are equal on \((j,t)\,{\not \in }\,S\).

Optimizations and Efficiency. As mentioned above, we can improve on both bandwidth and rounds in the Retrieval phase of 3PC-ORAM.ML shown in Algorithm 2. The optimization comes from an observation that our protocol KSearch (see Algorithm 6, Appendix A) takes just one round to compute shift-sharing of index j, and its second round is a resharing which transforms into . This round of resharing can be saved, and we can re-arrange protocols 3ShiftPIR and 3ShiftXorPIR so they use only as input and effectively piggyback creating the rest of in such a way that the modified protocols, denoted resp. 3ShiftPIR -Mod and 3ShiftXorPIR -Mod take 2 rounds, which makes the whole Retrieval take only 3 rounds, hence access protocol 3PC-ORAM.Access takes \(3h\) rounds in Retrieval, and, surprisingly, the same is true for Retrieval with Post-Processing. For further explanations we refer to [21].

4 Security

Protocol 3PC-ORAM of Sect. 3 is a three-party secure computation of an Oblivious RAM functionality, i.e. it can implement RAM for any 3PC protocol in the RAM model. To state this formally we define a Universally Composable (UC) Oblivious RAM functionality \(\mathsf {F}_{\mathsf {ORAM}}\) for 3-party computation (3PC) in the framework of Canetti [7], and we argue that our 3PC ORAM realizes \(\mathsf {F}_{\mathsf {ORAM}}\) in the setting of \(m\,{=}\,3\) parties with honest majority, i.e. only \(t\,{=}\,1\) party is (statically) corrupted, assuming honest-but-curious (HbC) adversary, i.e. corrupted party follows the protocol. We assume secure pairwise links between the three parties. Since we have static-corruptions, HbC adversary, and non-rewinding simulators, security holds even if communication is asynchronous.

3PC ORAM Functionality. Functionality \(\mathsf {F}_{\mathsf {ORAM}}\) is parametrized by address and record sizes, resp. \({\log n}\) and \(D\), and it takes command \(\mathsf {Init}\), which initializes an empty array \(\mathsf {M}\in \mathsf {array}^{D}[n]\), and for \((\mathsf {instr},\mathrm {N},\mathsf {rec}')\in \{\mathsf {read},\mathsf {write}\}\times \{0{,}1\}^{{\log n}}\times \{0{,}1\}^{D}\), which returns a fresh secret-sharing of record \(\mathsf {rec}\,{=}\,\mathsf {M}[\mathrm {N}]\), and if \(\mathsf {instr}\,{=}\,\mathsf {write}\) it also assigns \(\mathsf {M}[\mathrm {N}]\,{:=}\,\mathsf {rec}'\). Technically, \(\mathsf {F}_{\mathsf {ORAM}}\) needs each of the three participating parties to make the call, where each party provides their part of the sharing, and \(\mathsf {F}_{\mathsf {ORAM}}\)’s output is also delivered in the form of a corresponding share to each party. However, in the HbC setting all parties are assumed to follow the instructions provided by an environment algorithm \(\mathsf {Z}\), which models higher-level protocol which utilizes \(\mathsf {F}_{\mathsf {ORAM}}\) to implement oblivious memory access. Hence we can simply assume that \(\mathsf {Z}\) sends \(\mathsf {Init}\) and to \(\mathsf {F}_{\mathsf {ORAM}}\) and receives in return.

Security of our 3PC ORAM. Since our protocol is a three-party secure emulation of Circuit-ORAM, we prove that it securely realizes \(\mathsf {F}_{\mathsf {ORAM}}\) in the \((t,m)\,{=}\,(1,3)\) setting if Circuit-ORAM defines a secure Client-Server ORAM, which implies security of 3PC-ORAM by the argument for Circuit-ORAM security given in [27]. We note that protocol 3PC-ORAM.Access of Sect. 3 implements only procedure \(\mathsf {Access}\). Procedure \(\mathsf {Init}\) can be implemented by running 3PC-ORAM.Access with \(\mathsf {instr}\,{=}\,\mathsf {write}\) in a loop for \(\mathrm {N}\) from 0 to \(n{-}1\) (and arbitrary \(\mathsf {rec}'\)’s), but this requires some adjustments in 3PC-ORAM.Access and 3PC-ORAM.ML to deal with initialization of random label assignments and their linkage. We leave the specification of these (straightforward) adjustments to the full version, and our main security claim, stated as Corollary 1 below, assumes that \(\mathsf {Init}\) is executed by a trusted-party.

For lack of space we defer the proof of Corollary 1 to [21], Appendix C. Very briefly, the proof uses UC framework, arguing that each protocol securely realizes its intended input/output functionality if each subprotocol it invokes realizes its idealized input/output functionality. All subprotocols executed by protocol 3PC-ORAM.ML of Sect. 3 are accompanied with brief security arguments which argue precisely this statement. As for 3PC-ORAM.ML, its security proof, given in [21], Appendix C, is centered around two facts argued in Sect. 3, namely that our way of implementing Circuit-ORAM eviction map, with \(\mathsf {D}\) holding \(\sigma ^{\circ }=\pi \cdot \sigma \cdot \pi ^{-1}\) and \(\mathsf {t}^{\circ }=\rho \oplus \pi (\mathsf {t}\oplus \delta )\) and \(\mathsf {E},\mathsf {C}\) holding \(\pi ,\rho ,\delta \) is (1) correct, because \(\varPi ^{-1}\cdot \mathsf{\small {EM}}_{\sigma ^{\circ },\mathsf {t}^{\circ }}\cdot \varPi =\mathsf{\small {EM}}_{\sigma ,\mathsf {t}}\) for \(\varPi =\tilde{\rho }\cdot \ddot{\pi }\cdot \tilde{\delta }\), and (2) it leaks no information to either party, because random \(\pi ,\rho ,\delta \) induce random \(\sigma ^{\circ },\mathsf {t}^{\circ }\) in \(\mathsf {D}\)’s view.

Corollary 1

(from [21], Appendix C). Assuming secure initialization, 3PC-ORAM.Access is a UC-secure realization of 3PC ORAM functionality \(\mathsf {F}_{\mathsf {ORAM}}\).

5 Performance Evaluation

We tested a Java prototype of our 3PC Circuit-ORAM, with garbled circuits implemented using the ObliVM library by Wang [27], on three AWS EC2 c4.2xlarge servers, with communication links encrypted using AES-128. Each c4.2xlarge instance is equipped with eight Intel Xeon E5-2666 v3 CPU’s (2.9 GHz), 15 GB memory, and has 1 Gbps bandwidth. (However, our tested prototype utilizes multi-threading only in parallel Eviction, see below.)

In the discussion below we use the following acronyms:

- cust-3PC: our 3PC Circuit-ORAM protocol;

- gen-3PC: generic 3PC Circuit-ORAM using 3PC of Araki et al. [1];

- 2PC: 2PC Circuit-ORAM [27];

- C/S: the client-server Path-ORAM [26].

Wall Clock Time. Figure 6 shows online timing of cust-3PC for small record sizes (\(D\,{=}\,4\)B) as a function of address size \({\log n}\). It includes Retrieval wall clock time (WC), End-to-End (Retrieval+PostProcess+Eviction) WC, and End-to-End WC with parallel Eviction for all trees, which shows 60% reduction in WC due to better CPU utilization. Note that Retrieval takes about 8 milliseconds for \({\log n}\,{=}\,30\) (i.e. \(2^{30}\) records), and that Eviction takes only about 4–5 times longer. Recall that Retrieval phase has \(3h\) rounds while Eviction has \(6\), which accounts for much smaller CPU utilization in Retrieval.

CPU Time. We compare total and online CPU time of cust-3PC and 2PC in Fig. 7 with respect to memory size \(n\), for \(D=4\)B.^{Footnote 6} Since 2PC implementation [27] does not provide online/offline separation, we approximate 2PC online CPU time by its garbled circuit evaluation time, because 2PC costs due to OT’s can be pushed to precomputation. As Fig. 7 shows, the cust-3PC CPU costs are between 6x and 10x lower than in 2PC, resp. online and total, already for \({\log n}=25\), and the gap widens for higher \(n\). In [21], Appendix E.2 we include CPU time comparison with respect to \(D\), which shows CPU ratio of 2PC over cust-3PC grows to \({\approx }\,25\) for \(D\ge 10\) KB.

Bandwidth Comparison with Generic 3PC. Timing results depend on many factors (language, network, CPU, and more), and bandwidth is a more reliable predictor of performance for protocols using only light symmetric crypto. In Fig. 8 we compare online bandwidth of cust-3PC, gen-3PC, and C/S, as a function of the address size \({\log n}\), for \(D=4\)B. We see for small records our cust-3PC is only a factor of 2x worse than the optimal-bandwidth gen-3PC (which, recall, has completely impractical round complexity). In [21], Appendix E.2 we show that as \(D\) grows, cust-3PC beats gen-3PC in bandwidth for \(D{\ge }1\) KB.

Bandwidth Comparison with 2PC ORAMs. In Fig. 9 we compare total bandwidth of cust-3PC and several 2PC ORAM schemes, including 2PC, the DPF-based FLORAM scheme of [12], the 2PC SQRT-ORAM of [30], and a trivial linear-scan scheme. Our cust-3PC bandwidth is competitive to FLORAM for all \(n\)’s, but for \(n\,{\ge }\,24\) the \(O(\sqrt{n})\) asymptotics of FLORAM takes over. Note also that FLORAM uses O(n) local computation vs. our \(O({{\log ^3}n})\), so in the FLORAM case bandwidth comparison does not suffice. Indeed, for \(n=2^{30}\) and \(D=4\)B, [12] report \(>1\) s overall processing time on LAN vs. 40 msec for us.

For further discussions of bandwidth and CPU time with respect to record size \(D\), and cust-3PC CPU time component, refer to [21], Appendix E.2.

Notes

1.
In this paper we call the client-server ORAM implicit in [27] “Circuit-ORAM”, and its garbled-circuit 2PC implementation, also shown in [27], “2PC Circuit-ORAM”.
2.
We use Path-ORAM as a client-server baseline for these comparisons because Path-ORAM has the most “MPC-friendly” client, hence most MPC ORAM’s emulate securely either Path-ORAM or its predecessor, Binary-Tree ORAM [25]. (The recent 2PC ORAM of [12] is an exception, discussed below.).
3.
Using the BGW-style MPC over an arithmetic circuit for Circuit-ORAM, as was done by Keller and Scholl for another Path-ORAM variant [22], should also yield a bandwidth-competitive 3PC ORAM, but with round complexity at least \(\varOmega ({{\log ^2}n})\).
4.
2PC ORAM cost of [12] has stash linear scan \(O(T\kappa {\log n})\) and amortized re-init \(O(nD/T)\). Picking \(T=O(\sqrt{nD/\kappa {\log n}})\) we get \(O(\sqrt{\kappa Dn{\log n}})\). In [12] this is rendered as \(O(\sqrt{n})\) overhead, assuming \(D=\varOmega ({\log n})\) and omitting \(\kappa \). [12] also show O(1)-round 2PC ORAM, but at the price of increased bandwidth and computation.
5.
We estimate that the circuit depth of the Circuit-ORAM client, and hence the round complexity of the generic 3PC Circuit-ORAM, is \({>}\,1000\) even for \(n\,{=}\,2^{20}\), compared to \({\approx }15\) rounds in our 3PC ORAM and \({\approx }8\) in the client-server Path-ORAM.
6.
We include CPU comparisons only with 2PC Circuit-ORAM, and not SQRT-ORAM [30] and DPF-ORAM [12], because the former uses the same Java ObliVM GC library while the latter two use the C library Obliv-C. Still, note that for \(n=30\), the on-line computation due to FSS evaluation and linear memory scans contributes over 1 sec to wall-clock in [12], while our on-line wall-clock comes to 40 msec.

References

Araki, T., Furukawa, J., Lindell, Y., Nof, A., Ohara, K.: High-throughput semi-honest secure three-party computation with an honest majority. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016, pp. 805–817 (2016)
Google Scholar
Beimel, A., Ishai, Y., Malkin, T.: Reducing the servers computation in private information retrieval: PIR with preprocessing. J. Cryptol. 17, 125–151 (2004)
Article MathSciNet Google Scholar
Ben-Or, M., Goldwasser, S., Wigderson, A.: Completeness theorems for non-cryptographic fault-tolerant distributed computation. In: Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, STOC 1988, pp. 1–10. ACM, New York (1988)
Google Scholar
Bogdanov, D., Kamm, L., Kubo, B.: Students and taxes: a privacy-preserving study using secure computation. In: Proceedings on Privacy Enhancing Technologies (PET), pp. 117–135 (2016)
Google Scholar
Bogdanov, D., Laur, S., Willemson, J.: Sharemind: a framework for fast privacy-preserving computations. In: Jajodia, S., Lopez, J. (eds.) ESORICS 2008. LNCS, vol. 5283, pp. 192–206. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88313-5_13
Chapter Google Scholar
Bogetoft, P., et al.: Secure multiparty computation goes live. In: Dingledine, R., Golle, P. (eds.) FC 2009. LNCS, vol. 5628, pp. 325–343. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03549-4_20
Chapter Google Scholar
Canetti, R.: Universally composable security: a new paradigm for cryptographic protocols. In: Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, FOCS 2001. IEEE Computer Society, Washington, DC (2001)
Google Scholar
Chaum, D., Crépeau, C., Damgård, I.: Multiparty unconditionally secure protocols (extended abstract). In: Proceedings of the 20th Annual ACM Symposium on Theory of Computing, 2–4 May 1988, Chicago, Illinois, USA, pp. 11–19 (1988)
Google Scholar
Chor, B., Kushilevitz, E., Goldreich, O., Sudan, M.: Private information retrieval. J. ACM 45(6), 965–981 (1998)
Article MathSciNet Google Scholar
Damgård, I., Meldgaard, S., Nielsen, J.B.: Perfectly secure oblivious RAM without random oracles. In: Ishai, Y. (ed.) TCC 2011. LNCS, vol. 6597, pp. 144–163. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19571-6_10
Chapter Google Scholar
Devadas, S., van Dijk, M., Fletcher, C.W., Ren, L., Shi, E., Wichs, D.: Onion ORAM: a constant bandwidth blowup oblivious RAM. In: Kushilevitz, E., Malkin, T. (eds.) TCC 2016. LNCS, vol. 9563, pp. 145–174. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49099-0_6
Chapter Google Scholar
Doerner, J., Shelat, A.: Scaling ORAM for secure computation. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, pp. 523–535. ACM, New York (2017)
Google Scholar
Dvir, Z., Gopi, S.: 2 server PIR with subpolynomial communication. J. ACM 63(4), 391–3915 (2016)
Article MathSciNet Google Scholar
Faber, S., Jarecki, S., Kentros, S., Wei, B.: Three-party ORAM for secure computation. In: Iwata, T., Cheon, J.H. (eds.) ASIACRYPT 2015. LNCS, vol. 9452, pp. 360–385. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48797-6_16
Chapter Google Scholar
Fletcher, C.W., Naveed, M., Ren, L., Shi, E., Stefanov, E.: Bucket ORAM: single online roundtrip, constant bandwidth oblivious RAM. IACR Cryptology ePrint Archive, 2015:1065 (2015)
Google Scholar
Gentry, C., Goldman, K.A., Halevi, S., Julta, C., Raykova, M., Wichs, D.: Optimizing ORAM and using it efficiently for secure computation. In: De Cristofaro, E., Wright, M. (eds.) PETS 2013. LNCS, vol. 7981, pp. 1–18. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39077-7_1
Chapter Google Scholar
Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game. In: Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, STOC 1987, pp. 218–229. ACM, New York (1987)
Google Scholar
Goldreich, O., Ostrovsky, R.: Software protection and simulation on oblivious RAMs. J. ACM 43(3), 431–473 (1996)
Article MathSciNet Google Scholar
Gordon, S.D., Katz, J., Kolesnikov, V., Krell, F., Malkin, T., Raykova, M., Vahlis, Y.: Secure two-party computation in sublinear (amortized) time. In: Computer and Communications Security (CCS), CCS 2012, pp. 513–524 (2012)
Google Scholar
Ishai, Y., Kushilevitz, E., Lu, S., Ostrovsky, R.: Private large-scale databases with distributed searchable symmetric encryption. In: Sako, K. (ed.) CT-RSA 2016. LNCS, vol. 9610, pp. 90–107. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29485-8_6
Chapter Google Scholar
Jarecki, S., Wei, B.: 3PC ORAM with low latency, low bandwidth, and fast batch retrieval. IACR Cryptology ePrint Archive, 2018:347 (2018)
Google Scholar
Keller, M., Scholl, P.: Efficient, oblivious data structures for MPC. In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014. LNCS, vol. 8874, pp. 506–525. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45608-8_27
Chapter Google Scholar
Ostrovsky, R., Shoup, V.: Private information storage (extended abstract). In: Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing, El Paso, Texas, USA, 4–6 May 1997, pp. 294–303 (1997)
Google Scholar
Ren, L., Fletcher, C., Kwon, A., Stefanov, E., Shi, E., Van Dijk, M., Devadas, S.: Constants count: practical improvements to oblivious RAM. In: Proceedings of the 24th USENIX Conference on Security Symposium, SEC 2015, pp. 415–430. USENIX Association, Berkeley (2015)
Google Scholar
Shi, E., Chan, T.-H.H., Stefanov, E., Li, M.: Oblivious RAM with O((logN)³) worst-case cost. In: Lee, D.H., Wang, X. (eds.) ASIACRYPT 2011. LNCS, vol. 7073, pp. 197–214. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25385-0_11
Chapter Google Scholar
Stefanov, E., van Dijk, M., Shi, E., Fletcher, C., Ren, L., Yu, X., Devadas, S.: Path ORAM: an extremely simple oblivious ram protocol. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer Communications Security, CCS 2013, pp. 299–310. ACM, New York (2013)
Google Scholar
Wang, X., Chan, H., Shi, E.: Circuit ORAM: on tightness of the goldreich-ostrovsky lower bound. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 850–861 (2015). ACM, New York
Google Scholar
Wang, X.S., Huang, Y., Chan, T.-H.H., Shelat, A., Shi, E.: SCORAM: oblivious ram for secure computation. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS 2014, pp. 191–202. ACM, New York (2014)
Google Scholar
Yao, A.C.-C.: Protocols for secure computations (extended abstract). In: Proceedings of the 23rd Annual Symposium on Foundations of Computer Science, FOCS 1982, pp. 160–164 (1982)
Google Scholar
Zahur, S., Wang, X., Raykova, M., Gascón, A., Doerner, J., Evans, D., Katz, J.: Revisiting square-root ORAM efficient random access in multi-party computation. In: Proceedings of the 37th IEEE Symposium on Security and Privacy (“Oakland”). IEEE 2016 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Irvine, USA
Stanislaw Jarecki & Boyang Wei

Authors

Stanislaw Jarecki
View author publications
You can also search for this author in PubMed Google Scholar
Boyang Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stanislaw Jarecki .

Editor information

Editors and Affiliations

imec-COSIC, KU Leuven, Heverlee, Belgium
Bart Preneel
imec-COSIC, KU Leuven, Heverlee, Belgium
Frederik Vercauteren

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jarecki, S., Wei, B. (2018). 3PC ORAM with Low Latency, Low Bandwidth, and Fast Batch Retrieval. In: Preneel, B., Vercauteren, F. (eds) Applied Cryptography and Network Security. ACNS 2018. Lecture Notes in Computer Science(), vol 10892. Springer, Cham. https://doi.org/10.1007/978-3-319-93387-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-93387-0_19
Published: 10 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93386-3
Online ISBN: 978-3-319-93387-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics