Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

MPC ORAM. Multi-Party Computation Oblivious Random Access Memory (MPC ORAM), or Secure-Computation ORAM (SC ORAM), is a protocol which lets \(m\) parties implement access to a secret-shared memory in such a way that both memory records and the accessed locations remain hidden, and this security guarantee holds as long as no more than t out of \(m\) parties are corrupted. Applications of MPC ORAM stem from the fact that it can implement random memory access subprocedure within secure computation of any RAM program. Classic approaches to secure computation [3, 8, 17, 29] express computation as a Boolean or arithmetic circuit, thus their size, and consequently efficiency, is inherently lower-bounded by the size of their inputs. In practice this eliminates the possibility of secure computation involving large data, including such fundamental computing functionality as search and information retrieval. MPC ORAM makes such computation feasible because it generalizes secure computation from circuits to RAM programs: All RAM program instruction can be implemented using circuit-based MPC, since they involves only local variables, while access to (large) memory can be implemented with MPC ORAM.

As an application of MPC of RAM program, and hence of MPC ORAM, consider an MPC Database, i.e. an MPC implementation of processing of database queries over a secret-shared database. A typical database implementation would hash a searched keyword to determine an address of a hash table page whose content is then matched against the queried keyword. Standard MPC techniques can implement the hashing step, but the retrieval of the hash page is a random access to a large memory. Implementing this RAM access via garbled circuits requires \(\varOmega (nD\kappa )\) bandwidth, where \(n\) is the number of records, \(D\) is the record size, and \(\kappa \) is the cryptographic security parameter, which makes such computation unrealistic even for 1MB databases. By contrast, using MPC ORAM can cost \(O(\mathsf {poly}({\log n})D\kappa )\) and hence, in principle, can scale to large data.

Inefficiency Gap in MPC ORAM Constructions. The general applicability of MPC ORAM to MPC of any RAM program motivates searching for efficient MPC ORAM realizations. As pointed out in [10, 23], any ORAM with its client implemented with an MPC protocol yields MPC ORAM. This motivates searching for an ORAM with an MPC-friendly client, i.e. a client which can be efficiently computed using MPC techniques [16, 19, 22, 27, 28]. Indeed, the recent Circuit-ORAM proposal of Wang et al. [27] exhibits a variant of Path-ORAM of Stefanov et al. [26] whose client has a Boolean circuit of an asymptotically optimal size, i.e. a constant factor of the data which Path-ORAM client retrieves from the server, and which forms an input to its computation.

Still, in spite of the circuit-size optimality of Circuit-ORAM,Footnote 1 applying generic honest-but-curious MPC protocols to it yields MPC ORAM solutions which are two orders of magnitude more expensive than Path-ORAM:Footnote 2 Using Yao’s garbled circuit [29] on Circuit-ORAM yields a 2PC ORAM of [27] which has (asymptotically) the same round complexity as Path-ORAM, but its bandwidth, both online and in offline precomputation, is larger by \(\varOmega (\kappa )\) factor. Alternatively, applying GMW [17] or BGW [3] to the Boolean circuit for Circuit-ORAM yields 2PC or MPC ORAM which asymptotic preserves Path-ORAM bandwidth, but its round complexity is larger by \(\varOmega ({\log n}\log {\log n})\) factor (compare footnote 3).

Our Contribution: 3PC ORAM with Low Latency and Bandwidth We show that the gap between MPC ORAM and client-server ORAM can be bridged by exhibiting a 3PC ORAM, i.e. MPC for \(m\,{=}\,3\) servers with \(t\,{=}\,1\) fault, which uses customized, i.e. non-generic, 3PC protocols and asymptotically matches Path-ORAM in rounds, and, for records size \(D\,=\,\varOmega (\kappa {{\log ^2}n})\), bandwidth. Specifically, our 3PC ORAM securely emulates the Circuit-ORAM client in 3PC setting, using \(O({\log n})\) rounds and \(O(\kappa {{\log ^3}n}+D{\log n})\) bandwidth (see Fig. 1). We note that the 3PC setting of \((t,m)\) = (1, 3) gives weaker security than 2PC setting of \((t,m)\) = (1, 2), but it was shown to enable lower-cost solutions to many secure computation problems compared to both 2PC or general \((t,m)\)-MPC (e.g. [1, 5]) and for that reason it’s often chosen in secure computation implementations (e.g. [4, 6]). Here we show that 3PC benefits extend to MPC ORAM.

Fig. 1.
figure 1

Round and bandwidth comparisons, for \(n\): array size, \(D\): record size, \(\kappa \): cryptographic security parameter, \(\lambda \): statistical security parameter.

We show the benefits of our 3PC ORAM contrasted with previous 2PC and 3PC ORAM approaches in Fig. 1. In the 3PC setting we include a generic 3PC Circuit-ORAM, which results from implementing Circuit-ORAM with the generic 3PC protocol of Araki et al. [1], which is the most efficient 3PC instantiation we know of either the BGW or the GMW framework.Footnote 3 The second 3PC ORAM we compare to is Faber et al. [14], which uses non-generic 3PC techniques, like we do, but it emulates in 3PC with a less efficient Binary-Tree ORAM variant than Circuit-ORAM, yielding 3PC ORAM with bandwidth worse than ours by \(\varOmega (\lambda )\) factor. Regarding 2PC ORAM, several 2PC ORAM’s based on Binary-Tree ORAM variants were given prior to Circuit-ORAM [16, 19, 22, 28], but we omit them from Fig. 1 because Circuit-ORAM outperforms them [27]. We include two recent alternative approaches, 2PC ORAM of [30] based on Square-Root ORAM of [18], and 2PC FLORAM of [12] based on the Distributed Point Function (DPF) of [20]. However, both of these 2PC ORAM’s use \(O(\sqrt{n})\) bandwidth, and [12] also uses \(O(n)\) local computation, which makes them not scale well for large \(n\)’s.Footnote 4 Restricting the comparison to poly(\({\log n}\)) MPC ORAM, our 3PC ORAM offers the following trade-offs:

(1) Compared to the generic 3PC Circuit-ORAM [1] applied to Circuit-ORAM, we increase bandwidth from \(O({{\log ^3}n}\,{+}\,D{\log n})\) to \(O(\kappa {{\log ^3}n}\,{+}\,D{\log n})\) but reduce round complexity from \(O({{\log ^2}n}\log {\log n})\) to \(O({\log n})\);

(2) Compared to the generic garbled circuit 2PC [29] applied to Circuit-ORAM, we weaken the security model, from \((t,m)=(1,2)\) to \((t,m)=(1,3)\), but reduce bandwidth from \(O(\kappa {{\log ^3}n}\,{+}\,\kappa D{\log n})\) to \(O(\kappa {{\log ^3}n}\,{+}\,D{\log n})\).

Thus for medium-sized records, \(D=\varOmega (\kappa {{\log ^2}n})\), our 3PC ORAM asymptotically matches client-server Path-ORAM in all aspects, and beats 2PC Circuit-ORAM by \(\varOmega (\kappa )\) factor in bandwidth, without dramatic increase in round complexity incurred using generic 3PC techniques. In concrete terms, our round complexity is 50x lower than the generic 3PC Circuit-ORAM,Footnote 5 and, for \(D\,{>}\,1\) KB, our bandwidth is also \({>}50\)x lower than 2PC Circuit-ORAM. Our protocol is also competitive for small record sizes, e.g. \(D=4B\): First, our bandwidth is only about 2x larger than the generic 3PC Circuit-ORAM; Second, our bandwidth is lower than the 2PC Circuit-ORAM by a factor between 10x and 20x for \(20\,{\le }\,{\log n}\,{\le }\,30\).

Fast System Response and Batch Retrieval. Another benefit of our 3PC ORAM is a fast system response, i.e. the time we call a Retrieval Phase, from an access request to the retrieval of the record. In fact, our protocol supports fast retrieval of a batch of requests, because the expensive post-processing of each access (i.e. the Circuit-ORAM eviction procedure) can be postponed for a batch of requests, allowing all of them to be processed at a smaller cost. Low-bandwidth batch retrieval with postponed eviction was recently shown for client-server Path-ORAM variants [11, 24] (see also [15]), and our protocol allows MPC ORAM to match this property in the 3PC setting.

Specifically, our protocol processes \(b\,{=}\,O({\log n})\) requests in \(3b\,{+}\,3h\) rounds, using \(3D\,{+}\,O({{\log ^2}n}\log {\log n})\) bandwidth per record, and to the best of our knowledge no other MPC ORAM allows batch-processing with such costs. After retrieving \(b\) requests the protocol must perform all evictions, using \(6b\) rounds and \(O(b(\kappa {{\log ^3}n}+D{\log n}))\) total bandwidth, but this can be postponed for any batch size that benefits the higher-level MPC application. Concretely, for \({\log n}\,{\le }\,30\), the per-record bandwidth for \(b\,{\le }\,4{\log n}\) is only \({\le }\,3D\,{+}\,10\) KB.

Brief Overview of our 3PC ORAM. We sketch the main ideas behind our 3PC protocol that emulates Circuit-ORAM ORAM. Observe that Circuit-ORAM client, like a client in any Binary-Tree ORAM variant, performs the following steps: (1) locate the searched record in the retrieved tree path, (2) post-process that record (free-up its current location, update its labels, and add the modified record to the path root), (3) determine the eviction map, i.e. the permutation on positions in the retrieved path according to which the records will be moved in eviction, and (4) move the records on the path according to the eviction map. The main design principle in our 3PC emulation of Circuit-ORAM is to implement steps (1), (2), and (4) using customized asymptotically bandwidth-optimal and constant-round protocols (we explain some of the challenges involved in Sect. 2), and leave the eviction map computation step as in 2PC Circuit-ORAM, implemented with generic constant-round secure computation, namely garbled circuits. Circuit-ORAM computes the eviction map via data-dependent scans, which we do not know how to implement in constant rounds without the garbled circuit overhead. However, computation of the eviction map involves only on metadata, and is independent of record payloads. Hence even though using garbled circuits in this step takes \(O(\kappa )\) bandwidth per input bit, this is upper-bounded by the cost of bandwidth-optimal realization of the data movement step (4) already for \(D\,{\approx }\,140\)B.

Secondly, we utilize the 3PC setting in the retrieval phase, to keep its bandwidth especially low, namely \(O(D\,{+}\,{{\log ^2}n}\log {\log n})\). The key ingredient is a 3-party Secret-Shared PIR (SS-PIR) gadget, which computes a secret-sharing of record \(\mathsf {M}[\mathrm {N}]\) given a secret-sharing of array \(\mathsf {M}\) and of address \(\mathrm {N}\). We construct SS-PIR from any 2-server PIR [13] whose servers’ responses form an xor-sharing of the retrieved record, which is the case for many 2-PIR schemes [2, 9, 20]. Another component is a one-round bandwidth-optimal compiler from 3PC SS-PIR to 3PC Keyword SS-PIR, which retrieves shared value given a sharing of keyword and of (keyword,value) list. With a careful design we use only three rounds for the retrieval and post-processing steps, which allows pipelined processing of a batch of accesses using only three rounds per tree.

Roadmap. We overview the technical challenges of our construction in Sect. 2. We present our 3PC ORAM protocol in Sect. 3, argue its security in Sect. 4, and discuss our prototype performance in Sect. 5. For lack of space, all specialized sub-protocols our protocol requires are deferred to [21], Appendix A. The full security argument, the specification of garbled circuits we use, and further prototype performance data, are all included in [21], Appendices B-E.

2 Technical Overview

Overview of Path ORAM [26]. Our 3PC Circuit-ORAM is a 3PC secure computation of Circuit-ORAM of [27] (see footnote 1), which is a variant of Path-ORAM of Shi et al. [26]. We thus start by recalling Path-ORAM of [26], casting it in terms which are convenient in our context. Let \(\mathsf {M}\) be an array of \(n\) records of size \(D\) each. Server \(\mathsf {S}\) keeps a binary tree of depth \({\log n}\), denoted \(\mathsf {tree}\), shown in Fig. 2, where each node is a bucket of a small constant size \(w\), except the root bucket (a.k.a. a stash) which has size \(s\,{=}\,O({\log n})\). Each tree bucket is a list of tuples, which are records with four fields, \(\mathsf {fb}\), \(\mathsf {lb}\), \(\mathsf {adr}\), and \(\mathsf {data}\). For each address \(\mathrm {N}\in \{0{,}1\}^{\log n}\), record \(\mathsf {M}[\mathrm {N}]\) is stored in a unique tuple \(\mathsf {T}\) in \(\mathsf {tree}\) s.t. \(\mathsf {T}.(\mathsf {fb},\mathsf {lb},\mathsf {adr},\mathsf {data})=(1,\mathrm {L},\mathrm {N},\mathsf {M}[\mathrm {N}])\) where \(\mathsf {fb}\) is a full/empty tuple status bit and \(\mathrm {L}\) is a label which defines a tree leaf assigned at random to address \(\mathrm {N}\).

Fig. 2.
figure 2

Path ORAM (final) tree

Fig. 3.
figure 3

Path ORAM recursive access

Data-structure \(\mathsf {tree}\) satisfies an invariant that a tuple with label \(\mathrm {L}\) lies in a bucket on the path from the root to leaf \(\mathrm {L}\), denoted \(\mathsf {tree}.\mathsf {path}(\mathrm {L})\). To access address \(\mathrm {N}\), client \(\mathsf {C}\) uses a (recursive) position map \(\mathsf {PM}\,{:}\,\mathrm {N}\,{\rightarrow }\,\mathrm {L}\) (see below) to find leaf \(\mathrm {L}\) corresponding to \(\mathrm {N}\), sends \(\mathrm {L}\) to \(\mathsf {S}\) to retrieve \(\mathsf {path}=\mathsf {tree}.\mathsf {path}(\mathrm {L})\), searches \(\mathsf {path}\) for \(\mathsf {T}\,{=}\,(1,\mathrm {L},\mathrm {N},\mathsf {M}[\mathrm {N}])\) with fields \((\mathsf {fb},\mathsf {adr})\) matching \((1,\mathrm {N})\), assigns new random leaf \(\mathrm {L}'\) to \(\mathrm {N}\), adds a modified tuple \(\mathsf {T}' = (1,\mathrm {L}',\mathrm {N},\mathsf {M}[\mathrm {N}])\) to the root bucket in \(\mathsf {path}\) (In case of write access \(\mathsf {C}\) also replaces \(\mathsf {M}[\mathrm {N}]\) in \(\mathsf {T}'\) with a new entry), and erase old \(\mathsf {T}\) from \(\mathsf {path}\) by flipping \(\mathsf {T}.\mathsf {fb}\) to 0. Finally, to avoid overflow, \(\mathsf {C}\) evicts tuples in \(\mathsf {path}\) as far down as possible without breaking the invariant or overflowing any bucket.

Position map \(\mathsf {PM}\,{:}\,\mathrm {N}\,{\rightarrow }\,\mathrm {L}\) is stored using the same data-structure, with each tuple storing labels corresponding to a batch of \(2^{\tau }\) consecutive addresses, for some constant \(\tau \). Since such position map has only \(2^{{\log n}}/2^{\tau }=2^{{\log n}-\tau }\) entries, this recursion results in \(h=({\log n}/\tau )\,{+}\,1\) trees \(\mathsf {tree}_0,,..,\mathsf {tree}_{h-1}\) which work as follows (see Fig. 3): Divide \(\mathrm {N}\) into \(\tau \)-bit blocks \(\mathrm {N}_1,...,\mathrm {N}_{h-1}\). The top-level tree, \(\mathsf {tree}_{h{-}1}\) contains the records of \(\mathsf {M}\) as described above, shown in Fig. 2, while for \(i<h{-}1\), \(\mathsf {tree}_i\) is a binary tree of depth \(d_i=i\tau \) which implements position map \(\mathsf {PM}_i\) that matches address prefix \(\mathrm {N}_{[1,...,i{+}1]}\,{=}\,\mathrm {N}_1|...|\mathrm {N}_{i{+}1}\) to leaf \(\mathrm {L}_{i{+}1}\) assigned to this prefix in \(\mathsf {tree}_{i{+}1}\). Access algorithm ORAM.Access traverses this data-structure by sequentially retrieving the labels assigned to each prefix of the searched-for address, using an algorithm we denote ORAM.ML For i from 0 to \(h{-}1\), algorithm \(\mathsf{\small {ORAM}}.\mathsf{\small {ML}}\) retrieves \(\mathrm {L}_{i{+}1}=\mathsf {PM}_i(\mathrm {N}_1|...|\mathrm {N}_{i{+}1})\) from \(\mathsf {tree}_i\) using the following steps: (1) it identifies path \(\mathsf {path}=\mathsf {tree}_i.\mathsf {path}(\mathrm {L}_i)\) in \(\mathsf {tree}_i\) using label \(\mathrm {L}_i\), (2) it identifies tuple \(\mathsf {T}\) in \(\mathsf {path}\) s.t. \(\mathsf {T}.\mathsf {adr}=\mathrm {N}_1|...|\mathrm {N}_i\), and (3) it returns \(\mathrm {L}_{i{+}1}=\mathsf {T}.\mathsf {data}[\mathrm {N}_{i{+}1}]\).

Circuit-ORAM vs. Path-ORAM. Circuit-ORAM (see footnote 1) follows the same algorithm as Path-ORAM except (1) the eviction procedure is restricted in that it moves only selected tuples down the path in \(\mathsf {path}\), as we discuss further below; and (2) it performs the eviction on two paths in each tree per access. Our 3PC emulation of Circuit-ORAM also runs twice per each tree per access, but since the second execution is limited to eviction, for simplicity of presentation we omit it in all discussion below, except when we report performance data.

Top-Level Design of 3PC Circuit-ORAM. The client algorithm in all variants of Binary-Tree ORAM, which includes Path-ORAM and Circuit-ORAM, consists of the following phases:

  1. 1.

    Retrieval, which given \(\mathsf {path}=\mathsf {tree}.\mathsf {path}(\mathrm {L})\) and address prefix \(\mathrm {N}\), locates tuple \(\mathsf {T}=(1,\mathrm {L},\mathrm {N},\mathsf {data})\) in \(\mathsf {path}\) and retrieves next-level label (or record) in \(\mathsf {data}\);

  2. 2.

    Post-Process, which removes \(\mathsf {T}\) from \(\mathsf {path}\), injects new labels into \(\mathsf {T}\), and re-inserts it in the root (=stash);

  3. 3.

    Eviction, which can be divided into two sub-steps:

    1. (a)

      Eviction Logic: An eviction map \(\mathsf{\small {EM}}\) is computed, by function denoted Route, on input label \(\mathrm {L}\) and the metadata fields \((\mathsf {fb},\mathsf {lb})\) of tuples in \(\mathsf {path}\),

    2. (b)

      Data Movement: Permute tuples in \(\mathsf {path}\) according to map \(\mathsf{\small {EM}}\).

Our 3PC ORAM is a secure emulation of the above procedure, with the Eviction Logic function Route instantiated as in Circuit-ORAM, and it performs all the above steps on the sharings of inputs \(\mathsf {tree}\) and \(\mathrm {N}\), given label \(\mathrm {L}\) as a public input known to all parties. With the exception of the next-level label recovered in Retrieval, all other variables remain secret-shared. Our implementation of the above steps resembles the 3PC ORAM emulation of Binary-Tree ORAM by [14] in that we use garbled circuit for Eviction Logic, and specialized 3PC protocols for Retrieval, Post-Process, and Data Movement. However, our implementations are different from [14]: First, to enable low-bandwidth batch processing of retrieval we use different sharings and protocols in Retrieval and Post-Process. Second, to securely “glue” Eviction Logic and Data Movement we need to mask mapping \(\mathsf{\small {EM}}\) computed by Eviction Logic and implement Data Movement given this masked mapping. We explain both points in more detail below.

Low-Bandwidth 3PC Retrieval. The Retrieval phase realizes a Keyword Secret-Shared PIR (Kw-SS-PIR) functionality: The parties hold a sharing of an array of (keyword, value) pairs, and a sharing of a searched-for keyword, and the protocol must output a sharing of the value in the (keyword, value) pair that contains the matching keyword. In our case the address prefix \(\mathrm {N}_{[1,i]}\) is the searched-for keyword and \(\mathsf {path}\) is the array of the (keyword, value) pairs where keywords are address fields \(\mathsf {adr}\) and values are payload fields \(\mathsf {data}\).

The 3PC implementation of Retrieval in [14] has \(O(\ell D)\) bandwidth where \(\ell \,{=}\,O({\log n})\) is the number of tuples in \(\mathsf {path}\), and here we reduce it to \(3D\,{+}\,O(\ell \log \ell )\) as follows: First, we re-use the Keyword Search protocol KSearch of [14] to create a secret-sharing of index j of a location of the keyword-matching tuple in \(\mathsf {path}\). This subprotocol reduces the problem to finding an index where a secret-shared array of length \(\ell \) contains an all-zero string, which has \(\varTheta (\ell \log \ell )\) communication complexity. Our KSearch implementation has \(2\ell (c+\log \ell )\) bandwidth where \(2^{-c}\) is the probability of having to re-run KSearch because of collisions in \(\ell \) pairs of \((c+\log \ell )\)-bit hash values. The overall bandwidth is optimal for \(c\,{\approx }\,\log \log \ell \), but we report performance numbers for \(c\,{=}\,20\).

Secondly, we use a Secret-Shared PIR (SS-PIR) protocol, which creates a fresh sharing of the j-th record given the shared array and the shared index j. We implement SS-PIR in two rounds from any 2-server PIR [13] whose servers’ PIR responses form an xor-sharing of the retrieved record. Many 2-PIR’s have this property, e.g. [2, 9, 20], but we exemplify this generic construction with the simplest form of 2-server PIR of Chor et al. [9] which has \(3\ell +3D\) bandwidth. This is not optimal in \(\ell \), but in our case \(\ell \,{\le }\,150\,{+}\,b\) where \(b\) is the number of accesses with postponed eviction, the optimized version of SS-PIR sends only \({\approx }\ell {+}3D\) bits on-line, and KSearch already sends \(O(\ell \log \ell )\) bits. Our generic 2-PIR to 3PC-SS-PIR compiler is simple (a variant of it appeared in [20]) but the 3-round 3PC Kw-SS-PIR protocol is to the best of our knowledge novel.

Fig. 4.
figure 4

Randomization of circuit ORAM’s bucket map

Efficient 3PC Circuit-ORAM Eviction. In Eviction we use a simple Data Movement protocol, with 2 round and \(\approx 2|\mathsf {path}|\) bandwidth. With three parties denoted as \((\mathsf {C},\mathsf {D},\mathsf {E})\), our protocol creates a two-party \((\mathsf {C},\mathsf {E})\)-sharing of \(\mathsf {path}'=\mathsf{\small {EM}}(\mathsf {path})\) from a \((\mathsf {C},\mathsf {E})\)-sharing of \(\mathsf {path}\) if party \(\mathsf {D}\) holds eviction map \(\mathsf{\small {EM}}\) in the clear. Naively outputting to party \(\mathsf {D}\) is insecure, as eviction map is correlated with the ORAM access pattern, so the question is whether \(\mathsf{\small {EM}}\) can be masked by some randomizing permutation known by \(\mathsf {C}\) and \(\mathsf {E}\). [14] had an easy solution for its binary tree ORAM variant because its algorithm Route outputs a regular \(\mathsf{\small {EM}}\), that buckets on every except the last level of the retrieved \(\mathsf {path}\) always move two tuples down to the next level, so all [14] needed to do is to randomly permute tuples on each bucket level of \(\mathsf {path}\), and the resulting new \(\mathsf{\small {EM}}'\) on the permuted \(\mathsf {path}\) leaks no information on \(\mathsf{\small {EM}}\). By contrast, Circuit-ORAM eviction map is non-regular (see Fig. 4): Its bucket level map \(\varPhi \) of \(\mathsf{\small {EM}}\) can move a tuple by variable distance and can leave some buckets untouched, both of which are correlated with the density of tuples in \(\mathsf {path}\), and thus with ORAM access pattern.

Thus our goal is to transform the underlying Circuit-ORAM eviction map \(\mathsf{\small {EM}}= (\varPhi ,\mathsf {t})\) into a map whose distribution does not depend on the data (\(\varPhi \) describes the bucket-level movement, while \(\mathsf {t}\) is an array containing one tuple index from each bucket that will be moved). We do so in two steps. First, we add an extra empty tuple to each bucket and we modify Circuit-ORAM algorithm Route to expand function \(\varPhi :\mathsf {Z}_{d}{\rightarrow }\mathsf {Z}_{d}\cup \{\perp \}\) into a cyclic permutation \(\sigma \) on \(\mathsf {Z}_{d}\) (\(d\) is the depth of \(\mathsf {path}\), \(\mathsf {Z}_{d}\) is the set \(\{0,...,d-1\}\)), by adding spurious edges to \(\varPhi \) in the deterministic way illustrated in Fig. 4. Second, we apply two types of masks to the resulting output \((\sigma ,\mathsf {t})\) of Route, namely a random permutation \(\pi \) on \(\mathsf {Z}_{d}\) and two arrays \((\delta ,\rho )\), each of which contains a random tuple index in each bucket. Our Eviction Logic protocol will use \((\pi ,\delta ,\rho )\) to mask \((\sigma ,\mathsf {t})\) by computing \((\sigma ^{\circ },\mathsf {t}^{\circ })\) s.t. \(\sigma ^{\circ }\,{=}\,\pi \cdot \sigma \cdot \pi ^{-1}\) (permutation composition) and \(\mathsf {t}^{\circ }\,{=}\,\rho \oplus \pi (\mathsf {t}\oplus \delta )\). And now we have a masked eviction map \(\mathsf{\small {EM}}_{\sigma ^{\circ },\mathsf {t}^{\circ }}\) that can be revealed to party \(\mathsf {D}\) but does not leak information on \(\mathsf{\small {EM}}_{\sigma ,\mathsf {t}}\) or \(\mathsf{\small {EM}}_{\varPhi ,\mathsf {t}}\).

3 Our Protocol: 3PC Emulation of Circuit-ORAM

Protocol Parties. We use \(\mathsf {C},\mathsf {D},\mathsf {E}\) to denote the three parties participating in 3PC-ORAM. We use \(x^{\mathsf {P}}\) to denote that variable x is known only to party \(\mathsf {P}\in \{\mathsf {C},\mathsf {D},\mathsf {E}\}\), \(x^{\mathsf {P}_1\mathsf {P}_2}\) if x is known to \(\mathsf {P}_1\) and \(\mathsf {P}_2\), and x if known to all parties.

Shared Variables, Bitstrings, Secret-Sharing. Each pair of parties \(\mathsf {P}_1,\mathsf {P}_2\) in our protocol is initialized with a shared seed to a Pseudorandom Generator (PRG), which allows them to generate any number of shared (pseudo)random objects. We write if \(\mathsf {P}_1\) and \(\mathsf {P}_2\) both sample x uniformly from set \(\mathsf {S}\) using the PRG on a jointly held seed. We use several forms of secret-sharing, and here introduce four of them which are used in our high level protocols 3PC-ORAM.Access and 3PC-ORAM.ML (Algorithms 1 and 2):

Integer Ranges, Permutations. We define \(\mathsf {Z}_{n}\) as set \(\{0,...,n{-}1\}\), and \(\mathsf {perm}_{n}\) as the set of permutations on \(\mathsf {Z}_{n}\). If \(\pi ,\sigma \in \mathsf {perm}_{n}\) then \(\pi ^{-1}\) is an inverse permutation of \(\pi \), and \(\pi \cdot \sigma \) is a composition of \(\sigma \) and \(\pi \), i.e. \((\pi \cdot \sigma )(i)=\pi (\sigma (i))\).

Arrays. We use \(\mathsf {array}^{m}[\ell ]\) to denote arrays of \(\ell \) bitstrings of size m, and we write \(\mathsf {array}[\ell ]\) if m is implicit. We use x[i] to denote the i-th item in array x. Note that \(x\in \mathsf {array}^{m}[\ell ]\) can also be viewed as a bitstring in \(\{0{,}1\}^{\ell m}\).

Permutations, Arrays, Array Operations. Permutation \(\sigma \in \mathsf {perm}_{\ell }\) can be viewed as an array \(x\in \mathsf {array}^{\log \ell }[\ell ]\), i.e. \(x=[\sigma (0),...,\sigma (\ell {-}1)]\). For \(\pi \in \mathsf {perm}_{\ell }\) and \(y\in \mathsf {array}[\ell ]\) we use \(\pi (y)\) to denote an array containing elements of y permuted according to \(\pi \), i.e. \(\pi (y)=[y_{\pi ^{-1}(0)},...,y_{\pi ^{-1}(\ell -1)}]\).

Garbled Circuit Wire Keys. If variable \(x\in \{0{,}1\}^m\) is an input/output in circuit C, and \(\mathsf {wk}\in \mathsf {array}^{\kappa }[m,2]\) is the set of wire key pairs corresponding to this variable in the garbled version of C, then \({}\{\mathsf {wk}\,{:}\,x\}\in \mathsf {array}^{\kappa }[m]\) denotes the wire-key representation of value x on these wires, i.e. \({}\{\mathsf {wk}\,{:}\,x\}=\{\mathsf {wk}[x[i]]\}_{i=1}^m\). If the set of keys is implicit we will denote \({}\{\mathsf {wk}\,{:}\,x\}\) as \(\overline{x}\).

figure a

3PC ORAM Protocol. Our 3PC ORAM protocol, 3PC-ORAM.Access, Algorithm 1, performs the same recursive scan through data-structure \(\mathsf {tree}_0,...,\mathsf {tree}_{h{-}1}\) as the client-server Path-ORAM (and Circuit-ORAM) described in Sect. 2, except it runs on inputs in secret-sharing format. The main loop of 3PC-ORAM.Access, i.e. protocol 3PC-ORAM.ML, Algorithm 2, also follows the corresponding client-server algorithm ORAM.ML, except that apart of the current-level leaf label \(\mathrm {L}\) which is known to all parties, all its other inputs are secret-shared as well.

Protocol 3PC-ORAM.ML calls subprotocols whose round/bandwidth specifications are stated in Fig. 5. (We omit computation costs because they are all comparable to link-encryption of communicated data). The low costs of these subprotocols are enabled by different forms of secret-sharings, e.g. xor versus additive, or 2-party versus 3-party, and by low-cost (or no cost) conversions between them. For implementations of these protocols we refer to [21] Appendix A.

figure b
Fig. 5.
figure 5

Round and bandwidth for subprotocols of Algorithm 2, for \(\ell \) the number of tuples on \(\mathsf {path}\) and x the circuit input size (\(\approx \ell (d+{\log n}) + d\log (w+1)\))

Three Phases of 3PC-ORAM.ML: Protocol 3PC-ORAM.ML computes on sharing for \(\mathsf {path}=\mathsf {tree}.\mathsf {path}(\mathrm {L})\) and it contains the same three phases as the client-server Path-ORAM, but implemented with specialized 3PC protocols:

(1) Retrieval runs protocol KSearch to compute “shift” (i.e. additive) sharing of index for tuple \(\mathsf {T}\,{=}\,\mathsf {path}[j]\) in \(\mathsf {path}\) s.t. \(\mathsf {path}[j].\mathsf {adr}\,{=}\,\mathrm {N}\) and \(\mathsf {path}[j].\mathsf {fb}\,{=}\,1\), i.e. it is the unique (and non-empty) tuple pertaining to address prefix \(\mathrm {N}\); Then it runs protocol 3ShiftPIR to extract sharing of the payload \(X=\mathsf {path}[j].\mathsf {data}\) of this tuple, given sharings and ; In parallel to \(\mathsf{\small {3ShiftPIR}}\) it also runs protocol 3ShiftXorPIR to publicly reconstruct the next-level label stored at position \(\varDelta {\mathrm {N}}\) in this tuple’s payload, i.e. \(\mathrm {L}_{i+1}\,{=}\,(\mathsf {path}[j].\mathsf {data})[\varDelta {\mathrm {N}}]\), given sharing and . This construction of the Retrieval emulation allows for presenting protocols 3ShiftPIR and 3ShiftXorPIR (see resp. Algorithm 9 and 11 in [21], Appendix A) as generic SS-PIR constructions from a class of 2-Server PIR protocols. However, a small modification of this design achieves better round and on-line bandwidth parameters, see an Optimizations and Efficiency Discussion paragraph below.

(2) Post-process runs the Update-Label-in-Tuple protocol ULiT to form sharing of a new tuple using sharing of the retrieved tuple’s payload, sharings and of the address prefix and the next address chunk, and sharings of new labels; In parallel to \(\mathsf{\small {ULiT}}\) it also runs protocol FlipFlag to flip the full/empty flag to 0 in the old version of this tuple in \(\mathsf {path}\), which executes on inputs the sharings of field \(\mathsf {fb}\) of tuples in \(\mathsf {path}\) and on the “shift” sharing ; Once ULiT terminates the parties can insert into sharing of the root bucket in \(\mathsf {path}\). At this point the root bucket has size \(s{+}1\) (or \(s{+}b\) if we postpone eviction for a batch of \(b\) accesses).

(3) Eviction emulates Circuit-ORAM eviction on sharing involved in retrieval (or another path because 3PC-ORAM.Access, just like client-server Circuit-ORAM, performs eviction on two paths per access). It uses the generic garbled circuit protocol GC(Route) to compute the Circuit-ORAM eviction map (appropriately masked), and then runs protocols PermBuckets, PermTuples, and XOT to apply this (masked) eviction map to the secret-shared . We discuss the eviction steps in more details below.

Eviction Procedure. As we explain in Sect. 2, we split Eviction into Eviction Logic, which uses garbled circuit subprotocol to compute the eviction map \(\mathsf{\small {EM}}\), and Eviction Movement, which uses special-purpose protocols to apply \(\mathsf{\small {EM}}\) to the shared path, which in protocol 3PC-ORAM.ML will be . However, recall that revealing the eviction map to any party would leak information about path density, and consequently the access pattern. We avoid this leakage in two steps: First, we modify the Circuit-ORAM eviction logic computation Route, so that when it computes bucket-level map \(\varPhi \) and the tuple pointers array \(\mathsf {t}\), which define an eviction map \(\mathsf{\small {EM}}_{\varPhi ,\mathsf {t}}\), the algorithm scans through the buckets once more to expand the partial map \(\varPhi \) into a complete cycle \(\sigma \) over the \(d\) buckets (see Fig.  4). (We include the modified Circuit-ORAM algorithm Route in [21], Appendix D.) Second, the garbled circuit computation \(\mathsf{\small {GC}}(\mathsf{\small {Route}})\), see Step 6, Algorithm 2, does not output \((\sigma ,\mathsf {t})\) to \(\mathsf {D}\) in the clear: Instead, it outputs \(\mathsf {t}'\,{=}\,\mathsf {t}\oplus \delta \) where \(\delta \) is a random array, used here as a one-time pad, and the garbled wire encoding of the bits of \(\sigma \,{=}\,[\sigma (1),...,\sigma (d)]\), i.e. the output wire keys \({}\{\mathsf {wk}\,{:}\,\sigma \}\,{=}\,\mathsf {wk}[i][\sigma [i]]\}_{i=1}^{d\log d}\).

Recall that we want \(\mathsf {D}\) to compute \((\sigma ^{\circ },\mathsf {t}^{\circ })\), a masked version of \((\sigma ,\mathsf {t})\), where \(\sigma ^{\circ }\,{=}\,\pi \cdot \sigma \cdot \pi ^{-1}\) and \(\mathsf {t}^{\circ }\,{=}\,\rho \oplus \pi (\mathsf {t}\oplus \delta )\), for \(\pi \) a random permutation on \(\mathsf {Z}_{d}\) and \(\delta ,\rho \) random arrays, all picked by \(\mathsf {C}\) and \(\mathsf {E}\). This is done by protocol PermBuckets, which takes 2 on-line rounds to let \(\mathsf {D}\) translate \({}\{\mathsf {wk}\,{:}\,\sigma \}\) into \(\sigma ^{\circ }\,{=}\,\pi \cdot \sigma \cdot \pi ^{-1}\) given \(\mathsf {wk}\) held by \(\mathsf {E}\) and \(\pi \) held by \(\mathsf {C},\mathsf {E}\), and (in parallel) PermTuples, which takes 2 rounds to let \(\mathsf {D}\) translate \(\mathsf {t}'\,{=}\,\mathsf {t}\oplus \delta \) into \(\mathsf {t}^{\circ }\,{=}\,\rho \oplus \pi (\mathsf {t}')\) given \(\pi ,\rho \) held by \(\mathsf {C},\mathsf {E}\). Then \(\mathsf {C},\mathsf {E}\) permute (implied by , because ) by where \(\ddot{\pi }\), \(\tilde{\delta }\), and \(\tilde{\rho }\) are permutations on \(\ell =d\cdot (w{+}1)\) tuples in the path induced by \(\pi ,\delta ,\rho \):

  • \(\pi \in \mathsf {perm}_{d}\) defines \(\ddot{\pi }\in \mathsf {perm}_{\ell }\) s.t. \(\ddot{\pi }(j,t)=(\pi (j),t)\), i.e. \(\ddot{\pi }\) moves position t in bucket j to position t in bucket \(\pi (j)\);

  • \(\delta \in \mathsf {array}^{\log {(w{+}1)}}[d]\) defines \(\tilde{\delta }\in \mathsf {perm}_{\ell }\) s.t. \(\tilde{\delta }(j,t)=(j,t\oplus \delta )\), i.e. \(\tilde{\delta }\) moves position t in bucket j to position \(t\oplus \delta [j]\) in bucket j; same for \(\rho \) and \(\tilde{\rho }\);

Now use protocol XOT in 2 round and \({\approx }\,2|\mathsf {path}|\) bandwidth to apply map \(\mathsf{\small {EM}}_{\sigma ^{\circ },\mathsf {t}^{\circ }}\) held by \(\mathsf {D}\) to . The result is for , and when \(\mathsf {C},\mathsf {E}\) apply \(\varPi ^{-1}\) to it they get for . Finally can be reconstructed from in 1 round and \(2|\mathsf {path}|\) bandwidth (see [21], Appendix A for secret-sharing conversions and reasoning), and can then be injected into .

Eviction Correctness. We claim that the eviction protocol described above implements mapping \(\mathsf{\small {EM}}_{\sigma ,\mathsf {t}}\) applied to \(\mathsf {path}\), i.e. that (note that \((\tilde{x})^{-1}=\tilde{x}\)):

(1)

Consider the set of points \(S=\{(j,\mathsf {t}[j])\, |\, j\in \mathsf {Z}_{d}\}\) which are moved by the left hand side (LHS) permutation \(\mathsf{\small {EM}}_{\sigma ,\mathsf {t}}\). To argue that Eq. (1) holds we first show that the RHS permutation maps any point \((j,\mathsf {t}[j])\) of S in the same way as the LHS permutation:

It remains to argue that RHS is an identity on points not in S, just like LHS. Observe that set \(S'\) of tuples moved by \(\mathsf{\small {EM}}_{\sigma ^{\circ },\mathsf {t}^{\circ }}\) consists of the following tuples:

Also note that:

which means that \(S'\,{=}\,\varPi (S)\), so if \((j,t)\,{\not \in }\,S\) then \(\varPi (j,t)\,{\not \in }\,S'\), hence \((\mathsf{\small {EM}}_{\sigma ^{\circ },\mathsf {t}^{\circ }}\cdot \varPi )(j,t)\,{=}\,\varPi (j,t)\), and hence \(\varPi ^{-1}\cdot \mathsf{\small {EM}}_{\sigma ^{\circ },\mathsf {t}^{\circ }}\cdot \varPi \) and \(\mathsf{\small {EM}}_{\sigma ,\mathsf {t}}\) are equal on \((j,t)\,{\not \in }\,S\).

Optimizations and Efficiency. As mentioned above, we can improve on both bandwidth and rounds in the Retrieval phase of 3PC-ORAM.ML shown in Algorithm 2. The optimization comes from an observation that our protocol KSearch (see Algorithm 6, Appendix A) takes just one round to compute shift-sharing of index j, and its second round is a resharing which transforms into . This round of resharing can be saved, and we can re-arrange protocols 3ShiftPIR and 3ShiftXorPIR so they use only as input and effectively piggyback creating the rest of in such a way that the modified protocols, denoted resp. 3ShiftPIR -Mod and 3ShiftXorPIR -Mod take 2 rounds, which makes the whole Retrieval take only 3 rounds, hence access protocol 3PC-ORAM.Access takes \(3h\) rounds in Retrieval, and, surprisingly, the same is true for Retrieval with Post-Processing. For further explanations we refer to [21].

4 Security

Protocol 3PC-ORAM of Sect. 3 is a three-party secure computation of an Oblivious RAM functionality, i.e. it can implement RAM for any 3PC protocol in the RAM model. To state this formally we define a Universally Composable (UC) Oblivious RAM functionality \(\mathsf {F}_{\mathsf {ORAM}}\) for 3-party computation (3PC) in the framework of Canetti [7], and we argue that our 3PC ORAM realizes \(\mathsf {F}_{\mathsf {ORAM}}\) in the setting of \(m\,{=}\,3\) parties with honest majority, i.e. only \(t\,{=}\,1\) party is (statically) corrupted, assuming honest-but-curious (HbC) adversary, i.e. corrupted party follows the protocol. We assume secure pairwise links between the three parties. Since we have static-corruptions, HbC adversary, and non-rewinding simulators, security holds even if communication is asynchronous.

3PC ORAM Functionality. Functionality \(\mathsf {F}_{\mathsf {ORAM}}\) is parametrized by address and record sizes, resp. \({\log n}\) and \(D\), and it takes command \(\mathsf {Init}\), which initializes an empty array \(\mathsf {M}\in \mathsf {array}^{D}[n]\), and for \((\mathsf {instr},\mathrm {N},\mathsf {rec}')\in \{\mathsf {read},\mathsf {write}\}\times \{0{,}1\}^{{\log n}}\times \{0{,}1\}^{D}\), which returns a fresh secret-sharing of record \(\mathsf {rec}\,{=}\,\mathsf {M}[\mathrm {N}]\), and if \(\mathsf {instr}\,{=}\,\mathsf {write}\) it also assigns \(\mathsf {M}[\mathrm {N}]\,{:=}\,\mathsf {rec}'\). Technically, \(\mathsf {F}_{\mathsf {ORAM}}\) needs each of the three participating parties to make the call, where each party provides their part of the sharing, and \(\mathsf {F}_{\mathsf {ORAM}}\)’s output is also delivered in the form of a corresponding share to each party. However, in the HbC setting all parties are assumed to follow the instructions provided by an environment algorithm \(\mathsf {Z}\), which models higher-level protocol which utilizes \(\mathsf {F}_{\mathsf {ORAM}}\) to implement oblivious memory access. Hence we can simply assume that \(\mathsf {Z}\) sends \(\mathsf {Init}\) and to \(\mathsf {F}_{\mathsf {ORAM}}\) and receives in return.

Security of our 3PC ORAM. Since our protocol is a three-party secure emulation of Circuit-ORAM, we prove that it securely realizes \(\mathsf {F}_{\mathsf {ORAM}}\) in the \((t,m)\,{=}\,(1,3)\) setting if Circuit-ORAM defines a secure Client-Server ORAM, which implies security of 3PC-ORAM by the argument for Circuit-ORAM security given in [27]. We note that protocol 3PC-ORAM.Access of Sect. 3 implements only procedure \(\mathsf {Access}\). Procedure \(\mathsf {Init}\) can be implemented by running 3PC-ORAM.Access with \(\mathsf {instr}\,{=}\,\mathsf {write}\) in a loop for \(\mathrm {N}\) from 0 to \(n{-}1\) (and arbitrary \(\mathsf {rec}'\)’s), but this requires some adjustments in 3PC-ORAM.Access and 3PC-ORAM.ML to deal with initialization of random label assignments and their linkage. We leave the specification of these (straightforward) adjustments to the full version, and our main security claim, stated as Corollary 1 below, assumes that \(\mathsf {Init}\) is executed by a trusted-party.

For lack of space we defer the proof of Corollary 1 to [21], Appendix C. Very briefly, the proof uses UC framework, arguing that each protocol securely realizes its intended input/output functionality if each subprotocol it invokes realizes its idealized input/output functionality. All subprotocols executed by protocol 3PC-ORAM.ML of Sect. 3 are accompanied with brief security arguments which argue precisely this statement. As for 3PC-ORAM.ML, its security proof, given in [21], Appendix C, is centered around two facts argued in Sect. 3, namely that our way of implementing Circuit-ORAM eviction map, with \(\mathsf {D}\) holding \(\sigma ^{\circ }=\pi \cdot \sigma \cdot \pi ^{-1}\) and \(\mathsf {t}^{\circ }=\rho \oplus \pi (\mathsf {t}\oplus \delta )\) and \(\mathsf {E},\mathsf {C}\) holding \(\pi ,\rho ,\delta \) is (1) correct, because \(\varPi ^{-1}\cdot \mathsf{\small {EM}}_{\sigma ^{\circ },\mathsf {t}^{\circ }}\cdot \varPi =\mathsf{\small {EM}}_{\sigma ,\mathsf {t}}\) for \(\varPi =\tilde{\rho }\cdot \ddot{\pi }\cdot \tilde{\delta }\), and (2) it leaks no information to either party, because random \(\pi ,\rho ,\delta \) induce random \(\sigma ^{\circ },\mathsf {t}^{\circ }\) in \(\mathsf {D}\)’s view.

Corollary 1

(from [21], Appendix C). Assuming secure initialization, 3PC-ORAM.Access is a UC-secure realization of 3PC ORAM functionality \(\mathsf {F}_{\mathsf {ORAM}}\).

5 Performance Evaluation

We tested a Java prototype of our 3PC Circuit-ORAM, with garbled circuits implemented using the ObliVM library by Wang [27], on three AWS EC2 c4.2xlarge servers, with communication links encrypted using AES-128. Each c4.2xlarge instance is equipped with eight Intel Xeon E5-2666 v3 CPU’s (2.9 GHz), 15 GB memory, and has 1 Gbps bandwidth. (However, our tested prototype utilizes multi-threading only in parallel Eviction, see below.)

In the discussion below we use the following acronyms:

  - cust-3PC: our 3PC Circuit-ORAM protocol;

  - gen-3PC: generic 3PC Circuit-ORAM using 3PC of Araki et al. [1];

  - 2PC: 2PC Circuit-ORAM [27];

  - C/S: the client-server Path-ORAM [26].

Fig. 6.
figure 6

Our 3PC-ORAM online wall-clock time (ms) vs \({\log n}\) for \(D=4\)B

Fig. 7.
figure 7

CPU time (ms) vs \({\log n}\), for \(D=4\)B

Fig. 8.
figure 8

Online bndw.(MB) vs \({\log n}\) for \(D\) = 4B

Fig. 9.
figure 9

Comparison with 2PC-ORAM’s in online\(+\)offline bndw.(MB) vs \({\log n}\) for \(D\) = 4B

Wall Clock Time. Figure 6 shows online timing of cust-3PC for small record sizes (\(D\,{=}\,4\)B) as a function of address size \({\log n}\). It includes Retrieval wall clock time (WC), End-to-End (Retrieval+PostProcess+Eviction) WC, and End-to-End WC with parallel Eviction for all trees, which shows 60% reduction in WC due to better CPU utilization. Note that Retrieval takes about 8 milliseconds for \({\log n}\,{=}\,30\) (i.e. \(2^{30}\) records), and that Eviction takes only about 4–5 times longer. Recall that Retrieval phase has \(3h\) rounds while Eviction has \(6\), which accounts for much smaller CPU utilization in Retrieval.

CPU Time. We compare total and online CPU time of cust-3PC and 2PC in Fig. 7 with respect to memory size \(n\), for \(D=4\)B.Footnote 6 Since 2PC implementation [27] does not provide online/offline separation, we approximate 2PC online CPU time by its garbled circuit evaluation time, because 2PC costs due to OT’s can be pushed to precomputation. As Fig. 7 shows, the cust-3PC CPU costs are between 6x and 10x lower than in 2PC, resp. online and total, already for \({\log n}=25\), and the gap widens for higher \(n\). In [21], Appendix E.2 we include CPU time comparison with respect to \(D\), which shows CPU ratio of 2PC over cust-3PC grows to \({\approx }\,25\) for \(D\ge 10\) KB.

Bandwidth Comparison with Generic 3PC. Timing results depend on many factors (language, network, CPU, and more), and bandwidth is a more reliable predictor of performance for protocols using only light symmetric crypto. In Fig. 8 we compare online bandwidth of cust-3PC, gen-3PC, and C/S, as a function of the address size \({\log n}\), for \(D=4\)B. We see for small records our cust-3PC is only a factor of 2x worse than the optimal-bandwidth gen-3PC (which, recall, has completely impractical round complexity). In [21], Appendix E.2 we show that as \(D\) grows, cust-3PC beats gen-3PC in bandwidth for \(D{\ge }1\) KB.

Bandwidth Comparison with 2PC ORAMs. In Fig. 9 we compare total bandwidth of cust-3PC and several 2PC ORAM schemes, including 2PC, the DPF-based FLORAM scheme of [12], the 2PC SQRT-ORAM of [30], and a trivial linear-scan scheme. Our cust-3PC bandwidth is competitive to FLORAM for all \(n\)’s, but for \(n\,{\ge }\,24\) the \(O(\sqrt{n})\) asymptotics of FLORAM takes over. Note also that FLORAM uses O(n) local computation vs. our \(O({{\log ^3}n})\), so in the FLORAM case bandwidth comparison does not suffice. Indeed, for \(n=2^{30}\) and \(D=4\)B, [12] report \(>1\) s overall processing time on LAN vs. 40 msec for us.

For further discussions of bandwidth and CPU time with respect to record size \(D\), and cust-3PC CPU time component, refer to [21], Appendix E.2.