1 Introduction

1.1 Background

Actively secure multiparty computation (in the dishonest majority setting) allows n parties to compute an arbitrary function over their private inputs while preserving the privacy of the parties and the correctness of the computation even in the presence of a malicious adversary, who might corrupt an arbitrary strict subset of the parties.

The field of secure two-party (2PC) and multiparty (MPC) computation has a rich literature, starting with Yao [40] and Goldreich-Micali-Wigderson [16] and attracted much interest during the past decade due to advances in efficiency and fast implementations [8, 23, 26, 37, 38]. Nevertheless, almost all previous works require the parties to first “unroll” the function into an arithmetic or Boolean circuit representation and then securely evaluate the circuit gate by gate. This is in contrast to modern design of algorithms of practical interest (e.g., binary search, Dijkstra’s shortest-path algorithm, Gale-Shapley stable matching, etc.) that are typically represented as Random Access Machine (RAM) programs that contain branches, recursions, loops etc., which utilize the O(1) access to memory, rather than circuits. In the following we provide the necessary overview on the RAM model of computation and how it is securely realized.

RAM Model of Computation. RAM is classically modeled as a protocol that is carried out between two entities: \(\mathsf {CPU}\) and \(\mathsf {MEMORY}\), which are essentially a couple of polynomial time Turing machines, such that their storage capacity is unbalanced, specifically, the \(\mathsf {CPU}\) usually stores a small amount of data, corresponding to the state of the program, which is logarithmic in the amount of storage in \(\mathsf {MEMORY}\) required by the program. We denote the \(\mathsf {CPU}\)’s storage by d and the \(\mathsf {MEMORY}\)’s storage by D such that \(|D|=N\) and \(|d|=O(\log N)\). We denote a memory block at address i by D[i]. During the program execution \(\mathsf {CPU}\) typically chooses to perform one instruction \(\textit{I}\) out of a final instructions set \(\textit{IS}\). A program \(\varPi \) and an input \(\varvec{x}\) are first loaded into the storage of \(\mathsf {MEMORY}\) and then the \(\mathsf {CPU}\) is being triggered to start working. From that point, \(\mathsf {CPU}\) and \(\mathsf {MEMORY}\) are engaged in a protocol with T rounds where T is the running time of \(\varPi \). In the t-th round:

  1. 1.

    \(\mathsf {CPU}\) computes the CPU-step function:

    $$\begin{aligned} \mathrm {C}_{\scriptscriptstyle \mathrm {CPU}}(\mathsf {state}_{t} ,b_{t}^{\mathsf {read} }) = (\mathsf {state}_{t+1} ,i_{t}^{\mathsf {read} },i_{t}^{\mathsf {write} },b_{t}^{\mathsf {write} }) \end{aligned}$$
    (1)

    by executing instruction \(\textit{I}_t \in \textit{IS}\). The input \(\mathsf {state}_{t} \) is the current state of the program (registers etc.), \(b_{t}^{\mathsf {read} }\) is the block that was most recently loaded from \(\mathsf {MEMORY}\). The outputs of the CPU-step are: The new program’s state \(\mathsf {state}_{t+1} \), the address \(i_{t}^{\mathsf {read} }\) in D to read from and the address \(i_{t}^{\mathsf {write} }\) in D to write the block \(b_{t}^{\mathsf {write} }\) to.

  2. 2.

    \(\mathsf {CPU}\) sends \((i_{t}^{\mathsf {read} },i_{t}^{\mathsf {write} },b_{t}^{\mathsf {write} })\) to \(\mathsf {MEMORY}\). We define \(\mathsf {access}_t \triangleq (i_{t}^{\mathsf {read} },i_{t}^{\mathsf {write} })\).

  3. 3.

    \(\mathsf {MEMORY}\) sends data block \(D[i_{t}^{\mathsf {read} }]\) to \(\mathsf {CPU}\) and assigns \(D[i_{t}^{\mathsf {write} }]=b_{t}^{\mathsf {write} }\).

In every such a round, \(\mathsf {CPU}\) is said to make a single request, or logical access, to \(\mathsf {MEMORY}\). The output of the protocol, denoted \(\varvec{y}=\varPi (D,\varvec{x})\), is the result of the computation of the program \(\varPi \) on input \(\varvec{x}\) and memory D, such that \(\mathsf {CPU}\) sets \(\varvec{y}\) as the last state of the program, \(\mathsf {state}_{T+1} \). The sequence of accesses \(\{\mathsf {access}_1,\ldots ,\mathsf {access}_T \}\) is called the access pattern of \(\varPi \) on input \(\varvec{x}\) and memory D (of size N) and denoted \(\mathsf {AP}(\varPi ,D,\varvec{x})\). Similarly, the sequence \(\{\textit{I}_1,\ldots ,\textit{I}_T\}\) is called the instruction pattern and denoted \(\mathsf {IP}(\varPi ,D,\varvec{x})\).

The general methodology of designing secure multiparty computation directly to RAM programs is by having the parties take both the role of \(\mathsf {CPU}\) and \(\mathsf {MEMORY}\) and sequentially evaluate sufficiently many copies of the \(\mathrm {C}_{\scriptscriptstyle \mathrm {CPU}}\) function. Upon completing the evaluation of one function, the parties access D according to \(\mathrm {C}_{\scriptscriptstyle \mathrm {CPU}}\)’s output (\(i_{t}^{\mathsf {read} },i_{t}^{\mathsf {write} },b_{t}^{\mathsf {write} }\)) and obtain the input \(b_{t}^{\mathsf {read} }\) to the next function.

Obviously, a secure protocol must not reveal D to the parties, otherwise it would be possible to learn information about the parties’ inputs. Trivially avoiding this is by embedding two sub-procedures inside \(\mathrm {C}_{\scriptscriptstyle \mathrm {CPU}}\), one to encrypt (and authenticate) \(b^{\mathsf {write} }\) before it is output and one to decrypt (and verify authentication of) \(b^{\mathsf {read} }\) before it is used by \(\mathrm {C}_{\scriptscriptstyle \mathrm {CPU}}\). This enhanced function is denoted \(\mathrm {C}_{{\scriptscriptstyle \mathrm {CPU}}^+}\). Let \(\mathrm {C}_{{\scriptscriptstyle \mathrm {CPU}}^+}^1,\ldots ,\mathrm {C}_{{\scriptscriptstyle \mathrm {CPU}}^+}^T\) be garbled versions of \(\mathrm {C}_{{\scriptscriptstyle \mathrm {CPU}}^+}\). The parties feed their inputs \(\varvec{x}=x^1,\ldots ,x^n\) into \(\mathrm {C}_{\scriptscriptstyle \mathrm {CPU}}^1\), taking the place of the wires associated with \(\mathsf {state}_{1} \) and sequentially evaluate the garbled circuits to obtain \(\varvec{y}=\mathsf {state}_{T+1} \). This way, even an adversary who can tap (or even tamper) the memory accesses is unable to manipulate the program so it operates over forged data (since the data blocks are authenticated), yet, it might reveal information about the parties’ inputs or program’s state from the access pattern.

ORAM in Secure RAM Computation. Previous works on 2PC and MPC for RAM programs [1, 11, 14, 18,19,20, 22, 24, 28,29,30,31, 39] use Oblivious RAM (ORAM) as an important building block. Informally speaking, an ORAM scheme is a technique to transform a program \(\varPi \) with runtime T and initial storage D to a new program \(\varPi '\) with runtime \(T'\) and initial storage \(D'\) such that the access pattern \(\mathsf {AP}(\varPi ',D',\varvec{x})\) appears independent of both \(\varPi \) and \(\varvec{x}\), yet, both programs compute the same function, i.e. \(\varPi (D,\varvec{x})=\varPi '(D',\varvec{x})\) for all \(\varvec{x}\). All ORAM schemes that we know of work by first initializing the storage D and then online simulate each memory access individually (i.e. we don’t know of a scheme that simulates a bunch of accesses altogether). It was shown feasible, since the work of Goldreich and Ostrovsky [17], that the simulation of a single memory access of \(\varPi \) (which denoted by logical access above) incurs \(\mathsf {poly}(\log N)\) memory accesses in \(\varPi '\), denoted physical accesses, which leads to the same run time overhead, that is \(T'=T\cdot \mathsf {poly}(\log N)\). In addition, we can obtain the same overhead for memory consumption of \(\varPi '\), that is \(N'=N\cdot \mathsf {poly}(\log N)\).

The general methodology for secure computation of RAM programs using an ORAM scheme is by having the parties collaboratively compute an ORAM transformation of \(\varPi \) and D (via any MPC protocol) to obtain \(\varPi '\) and \(D'\). This is a one-time step that incurs a computational and communication complexity that is proportional to \(N'\). Then, they engage in a protocol of \(T'\) steps to compute \(\varPi '(D',x)\) and, as before, obtain the output as \(\mathsf {state}_{T'+1} \).

This way, to securely compute a program, it is no longer required to unroll it to a circuit, rather, it is enough to unroll only the ORAM scheme algorithms and the CPU-step function. Consequently, this approach may lead to concentrated research efforts to optimize a specific set of ORAM scheme algorithms instead of looking for optimizations to the circuit version of each individual program.

Oblivious vs. Non-oblivious Computation. We distinguish between oblivious and non-oblivious computation in the following sense: In oblivious computation the parties learn nothing about the computation (except its output and runtime). Specifically, the parties learn nothing about either the program \(\varPi \), \(\mathsf {CPU}\)’s state or the input \(\varvec{x}\). This means that an oblivious computation is applicable for private function evaluation (PFE) in which the function itself is kept secret. On the other hand, non-oblivious computation allows the parties to learn which instruction is being computed in which time step, in particular, it rules out algorithms that branch on secret values (otherwise, information about the secret values might be leaked). As noted in [28], in order to hide the instruction being computed - in every time step every possible instruction must be executed. The only implementation of an oblivious computation with active security that we know of is by Keller [22]. It has a performance of \(41\,\mathrm{{Hz}}\) (physical memory accesses per second) in the online phase with \(1024\times 64\) bit memory and \(2\,\mathrm{{Hz}}\) for \(2^{20}\times 64\) bit memoryFootnote 1 for 2 parties running over a local network. On the other hand, secure non-oblivious computation (denoted “instruction-trace oblivious” in [28]) is expected to yield a much better throughput, since the parties can avoid securely evaluating the universal CPU-step circuit, but can instead simply evaluate a much smaller circuit corresponding to the current instruction.

Notwithstanding the theoretical results in this paper hold for oblivious computation, the implementation results we report hold only for the non-oblivious settings. This relaxation is justified by the fact that non-oblivious computation is applicable for plenty of useful algorithms such as graph and search algorithms.

Achieving Efficient Protocols. To achieve an efficient, actively secure RAM computation the following crucial issues are to be addressed:

  1. 1.

    Round complexity. As explained above, securely evaluating a program requires \(T'=T\cdot \mathsf {poly}(\log N)\) rounds of interaction between \(\mathsf {CPU}\) and \(\mathsf {MEMORY}\), corresponding to the \(T'\) physical memory accesses. Also note that the access pattern of a program is determined by the input that it is given. Now, consider the CPU-step at time t from Eq. (1), the parties need to read \(D[i_{t}^{\mathsf {read} }]\) and map it to the input wires associated with \(b_{t}^{\mathsf {read} }\). However, they do not know from ahead (i.e. when garbling) which address \(i_{t}^{\mathsf {read} }\) would be accessed in which timestep and thus cannot map the input wire labels of \(\mathrm {C}_{\scriptscriptstyle \mathrm {CPU}}^t\) to the right memory location. Therefore, achieving a protocol with round complexity independent in T is much more challenging, in fact, there is a line of works that proposes constant-round secure RAM computation [7, 11,12,13,14, 19, 30], however, it is highly impractical. A more reasonable path, which we follow in this paper, is to construct a scheme with a constant number of rounds per any number of parallel physical memory accesses. Although there exist passively secure implementations [28, 29] that are constant-round per physical memory access, the actively secure implementations that we know of [22, 24] have a round complexity linear in the depth of the CPU-step circuit (which depends on the ORAM implementation).

  2. 2.

    Private and authenticated memory. A natural approach suitable for securely handling memory is to choose an ORAM that encrypts its memory contents. In this approach, the parties must evaluate CPU-step circuits that include encryption/decryption and authentication/verification sub-circuits. This is undesirable since the resulting construction is non black-box in its underlying encryption/authentication primitives and, more practically, embedding encryption and authentication sub-circuits in every CPU-step circuit adds a large overhead in terms of computation, communication and space complexities for garbling and evaluating. This would be especially objectionable when the original RAM’s behavior is non-cryptographic. The circuitry overhead when using the sub-circuit approach is demonstrated for several memory sizes in Table 1 where the circuit size is for a typical instruction that requires memory accessFootnote 2. Circuit size refers to the number of AND gates in the circuit performing a logical access, read and write are the number of bits being accessed and encryption/authentication size is the number of AND gates that would be necessary when incorporating the encryption and authentication procedures inside the CPU-step circuit. We measure the overhead using both our technique (described in Sect. 4.1) and a trivial solution using the AES block cipher (with circuit size of 6000 AND gatesFootnote 3), assuming blocks of \(s=40\) bits. We can see that even with our improvement (due to SPDZ representation of memory), securing memory accesses incurs an additional circuitry that is about 45 times larger than the ORAM circuit itself, therefore, we are highly motivated to find other techniques for transferring memory from storage to circuits.

  3. 3.

    Memory consumption. In actively secure BMR-based protocols memory used for storing the garbled circuit grows linearly with the number of gates and the number of parties. Let G be the number of AND gates, n the number of participants and \(\kappa \) the security parameter. In the online phase each party stores \(4\cdot G\cdot n \cdot \kappa \) bits that represent the garbled circuit and additional \(2 \cdot G \cdot \kappa \) bits that represent its own keys (the latter are needed to verify authenticity of the keys revealed during the evaluation and deciding which garbled entry to use next). For example, the SHA1 circuit is composed of \(\sim \)236K gates, among them \(\sim \)90K are AND gates. The evaluation of SHA1 with \(\kappa =128\) by 3 parties incurs memory of size \(\sim \)160 Mb and 0.5 Gb when evaluated by 10 parties. While this amount is manageable for a single execution of a circuit, it is much harder to be maintained when T garbled circuits are evaluated sequentially in an online phase, as needed in RAM computation. Thus, new techniques must be developed to address that issue.

Table 1. Proportion of additional circuitry, counted as the number of additional AND gates, for the purpose of memory encryption and authentication in a logical memory access.

1.2 Our Contribution

We construct and implement the first actively secure, garbled-circuit-based ORAM multiparty protocol. Specifically, we present the following contributions:

  1. 1.

    Efficient Secure Memory Access. We propose and compare three techniques to implement memory access in a secure computation for RAM programs. We briefly describe them in an increasing order of efficiency:

    1. (a)

      In the first technique, for each memory data item each party stores a SPDZ share of that data item. We stress that this technique has nothing to do with the SPDZ protocol, it only uses SPDZ shares representation to represent the memory content. In each access the data item is being re-shared using fresh randomness from the parties. Since SPDZ shares are also authenticated we achieve an authenticated memory as well. The re-sharing procedure is implemented as a sub-circuit, using only 2 field multiplications, which are embedded in every CPU-step circuit. For each s-bit block being accessed, the parties need to communicate \(O(sn^2\kappa )\) bits (by all parties together) to reveal the appropriate keys for the input wires in the next CPU-step circuit, where \(\kappa \) and s are the computational and statistical security parameters respectively, s is also the size of a SPDZ share. This is because every CPU-step receives n shares, each of size s bits, and for every bit all parties need to broadcast their keys of size \(\kappa \) bits. The technique requires two rounds of broadcast per physical memory access, however, as explained above, embedding encryption and authentication sub-circuits has theoretical and practical disadvantages.

    2. (b)

      The second technique is inspired by [1, 32], in which the memory is implemented via wire soldering. That is, since every wire already carries a hidden and authentic value through the key that is revealed to the parties, the key itself could be stored in memory. This way, the parties do not need to transform wire keys to data items back and forth for every access, instead, they use wire soldering directly from the “writing circuit” to the “reading circuit”. This “prunes away” the additional circuitry of the first technique, with the drawback of having each bit in the ORAM memory represented using a BMR key, i.e. \(n\kappa \) bits (with n the number of parties and \(\kappa \) the security parameter). This technique, however, is superior in the other metrics as well, that is, it requires much less triples to be generated in the offline phase since it does not need the additional circuitry (which includes many AND gates) and has 2 communication rounds for each physical memory access, just like the first technique.

      Naively generalizing the soldering of [1] to the multiparty settings requires each party to commit to its keys to all other parties using a xor-homomorphic commitment scheme. Instead, in this work we obviate the need of a commitment scheme and show how to use the readily available keys’ shares. Moreover, we show how to do that black-box in the BMR garbling functionality, even when using the Free-XOR optimization [2], by which different garbled circuits are assigned with different global differences.

    3. (c)

      The third technique offers a clever improvement to the soldering in that we solder only one bit, namely, the real value that passes through a wire, instead of the whole key that represents that value. As such, the soldering requires no offline overhead at all, that is, in contrast to the second technique, this technique does not invoke the multiplication command of the underlying MPC. We utilize the fact that the BMR-evaluation procedure reveals to the parties the external bit of each output wire (that is associated with a bit to be written to memory) and the fact that the permutation bits are already shared. This way the parties could obtain a share to a single bit which is the XOR (addition in the binary field) of the external and permutation bits.

    Nevertheless the third technique is the most promising for it is the most efficient in all parameters (see Table 2 for a comparison), the first and second techniques are also beneficial since they work in a higher level of abstraction and assume less about the circuit-garbling functionality. In particular, the first technique can be used with any underlying circuit-based protocol for evaluating the CPU-step circuits. The second technique requires an underlying protocol that relies on the idea of two keys per wire, such as the BMR construction, however it assumes nothing about the way BMR is implemented (recall that BMR on its own uses another MPC protocol to garble the gates). On the contrary, the third technique assumes a specific implementation of BMR, which shares the wires’ permutation bits among the parties. The SPDZ protocols family satisfies this last requirement and therefore we use it in our implementation.

  2. 2.

    Reduced Round and Space Complexities. As opposed to [22, 24] that require communication rounds for every layer of \(\mathrm {C}_{\scriptscriptstyle \mathrm {CPU}}\), and [29] that achieves only passive security, our protocol is constant round per physical memory access. As mentioned above, the parties can travel from one CPU-step to the next by simply performing SPDZ openings, which appears more efficient than using xor-homomorphic commitment to wire labels in a cut-and-choose based protocol such as [1] (for 2PC).

    We show that by representing memory as a “packed shares” the parties need to store only 2 bits per bit in the ORAM (that is, to operate an ORAM with \(N'\)-bit storage the previous parties need to store \(2N'\) bits). To the best of our knowledge this is the best concrete overhead that has been achieved to date. In contrast, other BMR-based protocols, such as one instantiated using our second technique, requires each party to store \(n\kappa \) bits per bit in the ORAM.

    We further devise a way to shrink the storage required by each party in the online phase. When using a garbling scheme that produces a garbled circuit of size independent in the number of parties (as recently proposed [3]) our optimization leads to a decrease in memory consumption of up to 2. We stress that this improvement is applicable to all BMR-based constructions. We present and prove security of it in Sect. 6.

  3. 3.

    Implementation. We have implemented the protocol using our most efficient memory access technique and obtained experimental access times results in both LAN and simulated WAN environments for two and three participants. In addition, we provide a comparison with the previous implementation of Keller and Scholl [6, 24] that is based purely on SPDZ. Our experiments show that [24] performs better over LAN (up to a factor of two for two parties) while our work does so over WAN (by one order of magnitude for two parties), justifying our efforts to reduce communication rounds. This supports the analysis that garbled circuits are more suitable for a setting with high latency because computation on secret values (after obtaining the garbled circuit) can be entirely done locally. Note, however, that we still require communication for revealing memory addresses and transferring memory values to garbled circuit wires. This is not the case for the trivial (asymptotically more expensive) approach where the whole memory is scanned for every access. We also implemented the latter and found that our protocol breaks even at a memory size in the 1’000s for the LAN setting and in the 100’000s for the WAN setting.

We stress that even though [38] also achieves a constant-round multiparty protocol for circuit-based computation (i.e. not RAM programs) our third and most efficient technique is not directly applicable to their construction. In particular, our technique relies on the fact that all parties can identify the correctness of wire labels without communication. This is the case for BMR because every party learns both possibilities for \(\kappa \) bits of every wire label. This is only true for one of two parties in the above work. We therefore leave it as an open problem how to combine the two techniques and how a possible combination would compare to our work.

1.3 Related Work

Gordon et al. [18] (who followed the work of Ostrovsky and Shoup [33] that was tailored specifically for PIR) designed the first general two-party, semi-honest, secure computation protocol for RAM. Their work focuses on the client-server settings, where the client has a small input and the server has a large database, and require the client to maintain only a small storage (i.e. logarithmic in the size of the database). Their technique relies on the one-time-initialization of the ORAM, after which, the server stores an encrypted version of the memory, then the parties iteratively engage in a traditional, circuit-based, secure two-party computation for every ORAM instruction.

Garbled RAM, introduced by Lu and Ostrovsky [30], is an analogue object of garbled circuit with respect to RAM programs. Namely, a user can garble an arbitrary RAM program directly without converting it into a circuit first. A garbled RAM scheme can be used to garble the memory, the program and the input in a way that reveals only the evaluation outcome and nothing else. The main advantage of garbled RAM is that it leads to a constant-round two-party or multi-party protocols to both semi-honest and malicious settings. This is reflected in a series of works on variations of garbled RAM [11,12,13,14, 19], however all of these works focused on showing feasibility rather than efficiency and are impractical.

Afshar et al. [1] presented two actively secure protocols for the two-party settings: One that works in the offline-online model and one for streaming. The main idea in both of their schemes is encoding RAM memory via wire labelsFootnote 4. When the program reads from memory location \(\ell \), it is possible to reuse the appropriate output wire labels from the most recent circuit to write to location \(\ell \) (which is not necessarily the previous circuit). Those protocols require the parties to coordinate before the evaluation of each CPU-step, either by soldering techniques that require XOR homomorphic commitments for aligning wire labels (based on [10, 32]) or by invocations of oblivious transfer to allow evaluation of next garbled circuits, in addition to a large amount of symmetric operations for garbling, encrypting and decrypting s copies of the circuit (since it uses the cut-and-choose technique). Overall, this would incur an additional overhead of \(O(sn\kappa )\), since for each input wire, of each of the O(s) garbled circuits, each party would need to commit and open its XOR homomorphic commitment, with computational security parameter \(\kappa \). Moreover, the streaming version requires both the garbler and the evaluator to maintain O(s) copies of the memory. That work was followed by [20, 31] to achieve a constant round protocol for ZKP of non-algebraic statements in the RAM model, but not for secure computation.

Keller and Scholl [24], showed how to implement two ORAM variants for the oblivious array and oblivious dictionary data structures, specifically, they compared their implementation for the binary Tree ORAM [34] and the Path ORAM [35] using various optimizations for many parts of the ORAM algorithms. Their implementation of secure oblivious array and dictionary are purely based on the SPDZ protocol, hence, they have no use of the techniques we develop in this paper because the memory in their work is represented exactly the same as the secret state of the program is represented. Therefore, there is no requirement of conversion between those two entities (memory and state). Due to their use of a secret-sharing based MPC using the SPDZ authenticated shares representation, evaluation of multiplication gates are performed interactively such that the product results are immediately authenticated, thus, parties can use the memory as usual shared secrets and verify authenticity only once, when the evaluation is finished. The drawback in their approach is the high round complexity that is implied on top of the ORAM round complexity. In our protocol, multiplications are evaluated inside a circuit and the authentication of the result is not an integrated part of the multiplication itself (as in the SPDZ protocol).

Doerner and Shelat [9] recently published a two-party passively secure computation for RAM programs and reported that it outperforms previous works, even when implemented using the state-of-the-art ORAM schemes, up to large memory sizes such as \(2^{32}\) elements of 4 bytes. Their Distributed ORAM scheme (AKA Floram) is derived from the Function Secret Sharing (FSS) for point functions by Boyle et al. [4, 5], which resembles the trivial ORAM that read/write all memory addresses for every access in order to hide its access pattern, however, this is resolved since those O(n) accesses are performed by a highly parallelizable local computation. The main advantage of Floram is that it has only O(1) communication rounds for both initialization and memory access and does not require secure computation at all for the initialization. We remark that even though FSS is feasible in the multiparty setting, it does not offer the same optimizations as it does to the two party setting, thus, Floram is currently not suitable for the multiparty setting. In addition, it is not trivial to lift their scheme to have active security.

2 Preliminaries

Relying on the notation and description of the RAM model of computation presented in Sect. 1.1, we directly proceed to the definition of Oblivious RAM:

2.1 Oblivious RAM

A polynomial time algorithm C is an Oblivious RAM (ORAM) compiler with computational overhead \(c(\cdot )\) and memory overhead \(m(\cdot )\), if C, when given a security parameter \(\kappa \) and a deterministic RAM program \(\varPi \) with memory D of size N, outputs a program \(\varPi '\) with memory \(D'\) of size \(N'=m(N)\cdot N\), such that for every input \(x\in \{0,1\}^*\) the running time of \(\varPi '(D',x)\) is bounded by \(T'=T\cdot c(N)\) and there is a negligible function \(\mu \) such that the following properties hold:

  • Correctness. For every memory size \(N\in \mathbb {N}\) and every input \(x\in \{0,1\}^*\) with probability at least \(1-\mu (\kappa )\), the output of the compiled program equals the output of the original program, i.e. \(\varPi '(D',x)=\varPi (D,x)\).

  • Obliviousness. For every two programs \(\varPi _1,\varPi _2\), every \(D_1,D_2\) of size N and every two inputs \(x_1,x_2\in \{0,1\}^*\), if the running times of \(\varPi _1(D_1,x_1)\) and \(\varPi _2(D_2,x_2)\) are T, then

    $$ \mathsf {AP}(C(\varPi _1,\kappa ),D_1,x_1)) \, {\mathop {\equiv }\limits ^{c}} \, \mathsf {AP}(C(\varPi _2,\kappa ),D_2,x_2)) $$

    where \(\mathsf {AP}(\cdot )\) is the access pattern as defined in Sect. 1.1.

As reflected from the above definition, our ORAM scheme is required to hide only the addresses that \(\mathsf {CPU}\) accesses since we handle the privacy and authenticity of the contents of the memory using other techniques. Also, note that the definition does not require to hide the runtime of the program.

2.2 Secure Computation in the RAM Model

Informally, a secure protocol for RAM programs must hide both program’s access pattern and its memory contents from the parties. In addition, it must keep the memory “fresh”, that is, it prevents the adversary to plug in an outdated memory block to the current CPU-step circuit.

Protocols in this model [13, 14, 19] typically induce two flavors of security definitions, such that their construction could be modular, i.e. first achieve a construction for the weaker security notion (usually called Unprotected Memory Access) and then enhance it with an ORAM to achieve full security. Informally, the definition of full security requires that the access pattern remains hidden, that is, the ideal adversary only obtains the runtime T of the program \(\varPi \) and the computation output \(\varvec{y}\). Given only T and \(\varvec{y}\), the simulator must be able to produce an indistinguishable access pattern. The weaker notion of security, as known as Unprotected Memory Access (UMA), leaks the memory contents as well as the access pattern to the adversary. In fact, UMA-secure protocols only deal with how to authentically pass a memory block written in the past to a circuit that needs to read it in a later point in time. In this work we use the same definition for full security, however, we use a different definition, called Unprotected Access Pattern (UAP) instead of the UMA. The definition of UAP is stronger than UMA since it requires the memory contents remain hidden from the adversary (and only the access pattern is leaked). Recall that since our construction is for the non-oblivious computation (see Sect. 1.1) in both security notions the adversary receives the instruction pattern as well.

Obviously, using a standard ORAM scheme we can easily transform a protocol that is UAP secure to a protocol that is fully secure [19], therefore, we may focus on the weaker notion (although our implementation achieves full security). We proceed to define both notions.

Full Security. Following the simulation paradigm [15, Chap. 7] we present the ideal and real models of executions of RAM programs.

Execution in the ideal model. In an ideal execution \(\mathcal {F}_{\scriptscriptstyle \mathrm {RAM}}\) (Fig. 1), the parties submit their inputs to a trusted party which in turn executes the program and returns the output. Let \(\varPi \) be a program with memory D of size N, which expects n inputs \(\varvec{x}=x^1,\ldots ,x^n\), let \(\mathcal {A}\) be a non-uniform PPT adversary and let \(I\subset [n]\) be the set of indices of parties that \(\mathcal {A}\) corrupts; we may refer to the set of corrupted parties by \(p_I\). Denote the ideal execution of \(\varPi \) on \(\varvec{x}\), auxiliary input z to \(\mathcal {A}\) and security parameter \(\kappa \) by the random variable \(\mathbf{{IDEAL}}^{\mathcal {F}_{\scriptscriptstyle \mathrm {RAM}}}_{\mathcal {A}(z),I}(\kappa ,\varPi ,D,\varvec{x})\), as the output set of the honest parties and the adversary \(\mathcal {A}\).

Fig. 1.
figure 1

Ideal execution of \(\varPi (N,\varvec{x})\) with abort.

Execution in the real model. In the real model there is no trusted party and the parties interact directly. The adversary \(\mathcal {A}\) sends all messages in place of the corrupted parties, and may follow an arbitrary PPT strategy whereas honest parties follow the protocol. Let \(\varPi ,D,\mathcal {A},I\) be as above and let \(\mathcal {P}\) be a multiparty protocol for computing \(\varPi \). The real execution of \(\varPi \) on input \(\varvec{x}\), auxiliary input z to \(\mathcal {A}\) and security parameter \(\kappa \), denoted by the random variable \(\mathbf{{REAL}}^\mathcal {P}_{\mathcal {A}(z),I}(\kappa ,\varPi ,D,\varvec{x})\), is defined as the outputs set of the honest parties and the adversary \(\mathcal {A}\).

Definition 2.1

(Secure computation). Protocol \(\mathcal {P}\) is said to securely compute \(\varPi \) with abort in the presence of malicious adversary if for every PPT adversary \(\mathcal {A}\) in the real model, there exists a PPT adversary \(\mathcal {S}\) in the ideal model, such that for every \(I \in [n]\), every \(\varvec{x},z\in \{0,1\}^*\) and for large enough \(\kappa \), the following holds

$$\begin{aligned} \left\{ \mathbf{{IDEAL}}^{\mathcal {F}_{\scriptscriptstyle \mathrm {RAM}}}_{\mathcal {S}(z),I}(\kappa ,\varPi ,D,\varvec{x}) \right\} _{\kappa ,\varvec{x},z} \, {\mathop {\equiv }\limits ^{c}} \, \left\{ \mathbf{{REAL}}^\mathcal {P}_{\mathcal {A}(z),I}(\kappa ,\varPi ,D,\varvec{x}) \right\} _{\kappa ,\varvec{x},z} \end{aligned}$$

Unprotected Access Pattern (UAP) Security. This notion allows the adversary to further inspect the access pattern. The ideal functionality \(\mathcal {F}_\mathsf {UAP}\) is given in Fig. 2 and realized by protocol \(\mathcal {P}_{\mathsf {UAP}}\) (Fig. 7).

Fig. 2.
figure 2

Ideal execution of \(\varPi (N,x)\) in the UAP model.

Definition 2.2

(Secure computation in the UAP model). Protocol \(\mathcal {P}\) is said to securely compute \(\varPi \) in the UAP model with abort in the presence of malicious adversary if for every PPT adversary \(\mathcal {A}\) for the real model, there is a PPT adversary \(\mathcal {S}\) for the ideal model, such that for every \(I\in [n]\), every \(\varvec{x},z\in \{0,1\}^*\) and for large enough \(\kappa \)

$$\begin{aligned} \left\{ \mathbf{{IDEAL}}^{\mathcal {F}_\mathsf {UAP}}_{\mathcal {S}(z),I}(\kappa ,\varPi ,D,\varvec{x}) \right\} _{\kappa ,\varvec{x},z} \, {\mathop {\equiv }\limits ^{c}} \, \left\{ \mathbf{{REAL}}^\mathcal {P}_{\mathcal {A}(z),I}(\kappa ,\varPi ,D,\varvec{x}) \right\} _{\kappa ,\varvec{x},z} \end{aligned}$$

The transformation (or compilation) from UAP to full security is not in the scope of this paper and can be found in previous works [11,12,13,14, 19]. We follow that path since it makes the security analysis simpler and modular, rather than proving full security from scratch. Therefore, functionality \(\mathcal {F}_\mathsf {UAP}\) in Fig. 2, which is realized in protocol \(\mathcal {P}_{\mathsf {UAP}}\) (Fig. 7), reveals the access pattern to the parties. By incorporating an ORAM scheme on top of our protocol that access pattern would be of no gain to the adversary for the reason that an access pattern of a program execution using an ORAM is indistinguishable from an access pattern of a randomly chosen program with the same runtime.

We note that achieving a UAP-secure protocol may be useful on its own (i.e. without lifting it up to full security) in cases where the original program \(\varPi \) is oblivious, that is, when the access pattern is permitted to be leaked to the parties.

3 Executing RAM Programs Using BMR

Our protocol follows the BMR-SPDZ approach [25, 27] and adapts the free-XOR technique for the BMR garbling scheme [2]. For completeness, in the following we describe the structure of the actively secure additive secret sharing used in SPDZ-like protocols and outline the BMR-SPDZ approach.

3.1 SPDZ Secret Sharing

SPDZ-like protocols use actively secure additive secret sharing over a finite field, combined with information theoretic MACs to ensure active security. A shared secret \(x\in \mathbb {F}\) is represented by

$$ \llbracket x\rrbracket =([x],[m(x)], [\alpha ])=(x^1,\ldots ,x^n,m(x)^1,\ldots ,m(x)^n, \alpha ^1,\ldots ,\alpha ^n) $$

where \(m(x)=x\cdot \alpha \) is a MAC on message x using a global key \(\alpha \). Party \(p_i\) holds: A uniformly random share \(x^i\) of x, a uniformly random share \(m(x)^i\) of m(x) and a uniformly random share \(\alpha ^i\) of \(\alpha \) such that

$$ x = \sum _{i=1}^n x^i ,\,\, \,\,\,\,\,\,\,\,\,\, m(x) = \sum _{i=1}^n m(x)^i ,\,\, \,\,\,\,\,\,\,\,\,\, \alpha = \sum _{i=1}^n \alpha ^i $$

We denote an additive secret shared value x by \([x]\) and its authenticated shared version by \(\llbracket x\rrbracket \). We also denote \(p_i\)’s share by \(\llbracket x\rrbracket ^i=(x^i,m(x)^i)\).

When opening a shared value \(\llbracket x\rrbracket \) the parties first broadcast their shares \(x^i\) and compute x. To ensure that x is correct, they then check the MAC by committing to and opening \(m(x)^i-x\cdot \alpha ^i\) and checking these shares sum up to zero.

3.2 The BMR-SPDZ Protocol

Unlike the two-party settings, in which we have one garbler and one evaluator, in the multiparty settings all parties are both garblers and evaluators such that no strict subset of parties can either influence or learn anything about the values that the wires carry. In the following we present the key points in the BMR-SPDZ approach:

Keys. Every party chooses a random key for each wire in the circuit, that is, party \(p_i\) chooses key \(k_w^i\in \mathbb {F}_{2^\kappa }\) for wire w. This key is named “0-key” and denoted \(k_{w,0}^i\) where \(k_{w,0}^i\) is essentially the i-th coordinate of a full 0-key, \(k_{w,0}=(k_{w,0}^1,\ldots ,k_{w,0}^n)\in (\mathbb {F}_{2^\kappa })^n\).

Global difference. To enable free-XOR, each party chooses its own global-difference, that is, party \(p_i\) randomly chooses \(\varDelta _i\) such that the difference between its 0-key and its 1-key is \(\varDelta _i\). Formally, \(k_{w,1}^i=k_{w,0}^i\oplus \varDelta _i\) for every w and i. Similarly \(\varDelta _i\) is the i-th coordinate of the full difference \(\varDelta =(\varDelta _1,\ldots ,\varDelta _n)\). The value \(\varDelta _i\) is known only to party \(p_i\) and no strict subset of the parties (that does not include \(p_i\)) can learn it. For wire w we get that \(k_{w,1}=k_{w,0}\oplus \varDelta \) where \(\oplus \) operates component-wise.

Permutation bits. In the course of the evaluation the parties obtain \(k_{w,b}\) with either \(b=0\) or \(b=1\) for every wire w. Party \(p_i\) could easily check whether \(b=0\) or \(b=1\) by extracting the ith element from \(k_{w,b}\) and compare it to \(k_{w,0}^i\) and \(k_{w,1}^i\). If \(b=0\) we say that the external value of wire w, denoted \(\varLambda _w\) is 0, otherwise, if \(b=1\), then \(\varLambda _w=1\). Since the real value that is carried by wire w, denoted by \(\rho _w\), must be kept secret, the external value \(\varLambda _w\) must reveal nothing about it. To this end, a random permutation bit, \(\lambda _w\), is assigned to each wire w in order to mask \(\rho _w\) by setting \(\varLambda _w=\lambda _w\oplus \rho _w\).

Inputs. Let w be an input wire that is associated with input \(x^i\) of party \(p_i\), then the parties open \(\lambda _w\) to party \(p_i\) only. Then \(p_i\) broadcasts \(\varLambda _w\) and \(k^i_{w,{\varLambda _w}}\) where \(\varLambda _w=\rho _w\oplus \lambda _w\) and \(\rho _w\) is its input to wire w. Then, party \(p_j\), for all j, broadcasts its \(\varLambda _w\)-key \(k_{w,\varLambda _w}^j\) such that all parties obtain \(k_{w,\varLambda _w}=(k_{w,\varLambda _w}^1, \ldots , k_{w,\varLambda _w}^n)\).

Outputs. If w is an output wire then the parties open the permutation bit \(\lambda _w\) to everyone. This way, upon obtaining key \(k_{w,\varLambda _w}\) the parties learn the real value of w by \(\rho _w=\varLambda _w\oplus \lambda _w\).

Encrypting a key. In the process of garbling, the parties encrypt the key of a gate’s output wire using the keys of its input wires. Let \(m=m^1,\ldots ,m^n\) be the key to be encrypted and \(k_\ell ,k_r\) with \(k_b=k_b^1,\ldots ,k_b^n\) be encryption keys of the left and right input wires, where party \(p_i\) has \(m^i,k_\ell ^i,k_r^i\). The parties produce the ciphertext \(c=c^1,\ldots ,c^n\) as follows: \(m^j\) is encrypted using \(k_\ell ,k_r\) to result \(c^j\) such that even a single missing coordinate of \(k_\ell \) and \(k_r\) prevents one from decrypting \(c^j\). To encrypt \(m^j\), party \(p_i\) provides \(F_{k_\ell ^i}(j),F_{k_r^i}(j)\), where F is a pseudorandom generator and then, using a protocol for secure computation the parties evaluate and output:

$$ c^j = \mathsf {Enc}_{k_\ell ,k_r}(m^j) = \bigg ( \bigoplus _{i=1}^{n} F_{k_\ell ^i}(j) \bigg ) \oplus \bigg ( \bigoplus _{i=1}^{n} F_{k_r^i}(j) \bigg ) \oplus m^j $$

Note that the keys \(k_\ell ^i,k_r^i\) are necessary for the decryption of \(c^j\) for every \(i,j\in [n]\).

Garbled gate. A garbled version of an AND gate g with input wires uv and output wire w, is simply a 4-entries table, each entry is an encryption of either \(k_{w,0}\) or \(k_{w,1}\), this depends on the permutation bits \(\lambda _u,\lambda _v\) and \(\lambda _w\). We want to enable the evaluator, who holds \(k_{u,\varLambda _u}\) and \(k_{v,\varLambda _v}\) (which are translated to \(\rho _u\) and \(\rho _v\) respectively) to decrypt the ciphertext in the \((2\varLambda _u+\varLambda _v)\)-th entry of the table and obtain \(k_{w,\varLambda _w}\) such that \(\rho _w=\rho _u\cdot \rho _v\). That is, we want to have \(\lambda _w\oplus \varLambda _w = (\lambda _u\oplus \varLambda _u)\cdot (\lambda _v\oplus \varLambda _v)\), thus, the \((2\varLambda _u+\varLambda _v)\)-th entry conceals \(k_{w,\varLambda _w}\) where

$$ \varLambda _w = (\lambda _u\oplus \varLambda _u)\cdot (\lambda _v\oplus \varLambda _v)\oplus \lambda _w $$

and since \(k_{w,1}=k_{w,0}\oplus \varDelta \) we get that the entry conceals

$$ k_{w,0} + \varLambda _w \cdot \varDelta = k_{w,0} + \big ((\lambda _u\oplus \varLambda _u)\cdot (\lambda _v\oplus \varLambda _v)\oplus \lambda _w \big )\cdot \varDelta $$

We conclude by presenting functionality \(\mathcal {F}_\mathsf {BMR}\) (Fig. 3) for a construction of a garbled circuit. Note that the only difference between \(\mathcal {F}_\mathsf {BMR}\) to the standard description of this functionality [2, 25] is that here the functionality lets the parties learn a share to the permutation bits \(\lambda _w\). This is necessary in order to obtain a neat security proof of the construction. Protocol \(\mathcal {P}_\mathsf {BMR}\) (Fig. 5) realizes \(\mathcal {F}_\mathsf {BMR}\) in the \(\mathcal {F}_{\mathsf {MPC}}\)-hybrid model (Fig. 11 in the appendix). Given a garbled circuit the parties evaluate it using the \(\mathcal {E}_\mathsf {BMR}\) procedure described in Fig. 4.

In the presentation of the protocol (Fig. 5) and to the rest of the paper, we denote by \(\langle x\rangle \) the handler (varid) of a variable x that is stored by \(\mathcal {F}_{\mathsf {MPC}}\).

Fig. 3.
figure 3

The BMR functionality.

Fig. 4.
figure 4

Evaluation of a BMR garbled circuit.

Fig. 5.
figure 5

Realizing \(\mathcal {F}_\mathsf {BMR}\) in the \(\mathcal {F}_{\mathsf {MPC}}\)-hybrid model.

3.3 Towards RAM Computation

To be able to securely compute RAM programs (in the UAP model) the parties garble T circuits \(GC^1,\ldots ,GC^T\) and then evaluate them sequentially. To this end, we must specify how the parties obtain the keys intended for the input wires of each garbled circuit (these are the input wires associated with values \(\mathsf {state}_{t} \) and \(b_{t}^{\mathsf {read} }\)). This task is divided in two: First, the input wires of \(GC^t\) associated with \(\mathsf {state}_{t} \) must carry the same values as the output wires associated with \(\mathsf {state}_{t} \) in \(GC^{t-1}\). Second, we need to support secure memory access, that is, the input wires of \(GC^t\) associated with \(b_{t}^{\mathsf {read} }\) must carry the same values as the output wires associated with \(b_{t'}^{\mathsf {write} }\) in \(GC^{t'}\), where \(t'\) is the most recent timestep in which address \(i_{t-1}^{\mathsf {read} }\) was modified. The first task could be easily achieved by changing \(\mathcal {P}_\mathsf {BMR}\) to choose the same keys for both output and input wires that are associated to the same state in every two consecutive garbled circuits, however, this would be non-black-box in \(\mathcal {F}_\mathsf {BMR}\) (since the functionality chooses its keys independently for every circuit). For a black-box solution we can use the techniques described in Sects. 4.2 and 4.3. We stress, though, that the two tasks are orthogonal and the techniques chosen to complete them are independent. Therefore, in the rest of the presentation we focus on realizing secure memory access (the second task) while taking for granted the traveling of the CPU’s state (i.e. we may write “the parties obtain the input wires of \(\mathsf {state}_{t} \)” without specifying how).

4 Accessing Memory

In this section we present the three techniques to achieve secure memory accesses and show how to realize \(\mathcal {F}_\mathsf {UAP}\) in the \(\mathcal {F}_\mathsf {BMR}\)-hybrid model using the third one. We compare the performance of the techniques in Table 2. The values within the table are explained alongside the description of the techniques.

Table 2. Performance of the three techniques with n parties. \(\kappa \) and s are the computational and statistical security parameters. The columns specify the following parameters: Number of input wires required in the CPU-step circuit for every input bit (of the ORAM) that is being read by that circuit. Amount of communication required for each memory access, this is measured in bits per input wire per party. Number of communication rounds required from the moment the parties obtained keys of output wires of \(GC^t\) to the moment they obtain keys to the input wires of \(GC^{t+1}\). Communication rounds could be used for secret opening, broadcasting a value or performing multiplication over shares; among the three, only multiplication requires more work to be done in the offline phase, specifically, multiplication requires sacrificing a multiplication triple. This is reflected in the Triples column, note that we multiply a vector of n keys rather than a single key. Memory overhead specifies how many bits do we store in the memory for a single bit of ORAM memory (again, this is per party. The total memory size that the parties store should be multiplied by n). The last column specifies whether a change in the garbled circuit is needed.
Fig. 6.
figure 6

Enhanced CPU-step circuit \(\mathrm {C}_{{\scriptscriptstyle \mathrm {CPU}}^+}\)

In the presentation below, we group some set of input/output wires together, according to their purpose as follows: \(W_\mathsf {in}\) refers to the input wires of \(GC^1\), which correspond to the parties’ inputs where \(W_{\mathsf {in},j}^i\) corresponds to the j-th bit of input \(x^i\). \(W_\mathsf {read}^t\), \(W_\mathsf {write}^t\),\(W^t_{\mathsf {addr, rd}}, W^t_{\mathsf {addr, wr}}\) refer to the input \(b_{t}^{\mathsf {read} }\), output \(b_{t}^{\mathsf {write} }\), addresses \(i_{t}^{\mathsf {read} }, i_{t}^{\mathsf {write} }\) respectively in \(GC^t\). In addition, \(W_\mathsf {state} ^{t}\) refers to the input state and \(W_{\mathsf {state} '}^t\) refers to the output state in \(GC^t\).

4.1 Memory via Embedded Authentication Sub-circuit

This technique assumes that the values \(b_{t}^{\mathsf {read} },b_{t}^{\mathsf {write} }\) are elements from the same field that SPDZ use as the underlying MPC protocol. This way, if SPDZ statistical parameter is s then the memory is divided into data items of s bits. We enhance \(\mathrm {C}_{\scriptscriptstyle \mathrm {CPU}}\) with two procedures: \(\mathsf {Verify}\) and \(\mathsf {AuthShare}\) and denote the result by \(\mathrm {C}_{{\scriptscriptstyle \mathrm {CPU}}^+}\) (see Fig. 6). Note that each party has 4s input wires for the purpose of authentication and sharing (assuming that the global MAC key is part of the state).

Privacy. The parties maintain their memory in the form of SPDZ shares, thus, to input content from location \(i^{\mathsf {read} }\) in memory, every party inputs its SPDZ share of this content from its own storage \(D[i^{\mathsf {read} }]\). Then the secret is being constructed within the CPU-step circuit. Since this is an additive secret sharing scheme, the content is being constructed using only XOR gates, which requires no communication.

Verify Authenticity. We enhance the CPU-step with a sub-circuit that verifies the authenticity of the secret \(b^{\mathsf {read} }\), the sub-circuit is denoted \(\mathsf {Verify}(\llbracket v\rrbracket )\) where v refers to \(b^{\mathsf {read} }\). Party \(p_i\) inputs \((v^i,m(v)^i,\alpha ^i)\) and the sub-circuit computesFootnote 5:

$$ v=\sum _{i=1}^{n} v^i ,\,\, \,\,\,\,\,\,\,\, m(v)=\sum _{i=1}^{n} m(v)^i ,\,\, \,\,\,\,\,\,\,\, \alpha =\sum _{i=1}^{n} \alpha ^i $$

and outputs \(\mathsf {ver}=1\) if \(m(v)-(\alpha \cdot v)=0\) (meaning verification succeeded) and 0 otherwise (meaning verification has not succeeded), which incurs \(s^2\) AND gates for a single multiplication operation in addition to \(2s-1\) AND gates for deciding whether \(\mathsf {ver}\) is 1 or 0. Note that this multiplication is over a polynomial ring over \(\mathbb {F}_2\), thus the addition involves only XOR. Furthermore, we check the result directly for zero and skip the reduction modulo an irreducible polynomial (mapping from \(\mathbb {F}_2[X]\) to \(\mathbb {F}_{2^s}\)), hence the \(2s-1\) AND gates for comparison.

Security. Obviously, since nothing is revealed except from the fact of the authenticity being correct, the adversary cannot extract any information regarding the value.

Authenticated Share. The CPU-step produces the value \(b^{\mathsf {write} }\) to be written to the memory, which obviously could not be output in the clear, rather, it is shared between all parties. The sub-circuit \(\mathsf {AuthShare}(v,[\alpha ],[r_1],[r_2])\) is given the value v to share (which refers to the value \(b^{\mathsf {write} }\)), the global MAC key \(\alpha \) and two freshly chosen \(r_1,r_2\in \mathbb {F}_{2^s}\) from the parties such that party \(p_i\) inputs \((\alpha ^i,r_1^i,r_2^i)\). The circuit computes

$$ r_1=\sum _{i=1}^{n} r_1^i ,\,\, \,\,\,\,\,\,\,\, r_2=\sum _{i=1}^{n} r_2^i ,\,\, \,\,\,\,\,\,\,\, \alpha =\sum _{i=1}^{n} \alpha ^i $$

and outputs \(\mathsf {val}=(v+r_1)\) and \(\mathsf {mac}=(\alpha \cdot v + r_2)\). To obtain the SPDZ sharing \(\llbracket v\rrbracket \), party \(p_1\) stores \(v^1=(\mathsf {val}-r_1^1)\) and \(m(v)^1=(\mathsf {mac}-r_2^1)\) and all other parties \(p_j\) store \(v^j=(-r_1^j)\) and \(m(v)^j=(-r_2^j)\).

Security. First note that \(\sum _{i=1}^{n} v^i = \mathsf {val}-r_1^1-\sum _{j=2}^{n} r_1^j=v\) and \(\sum _{i=1}^{n} m(v)^i = \mathsf {mac}-r_2^1 - \sum _{j=2}^{n} r_2^j= m(v)=\alpha \cdot v\) as required. The values v and its authentication m(v) are independently masked using a truly random values and thus are hidden from any strict subset of parties.

Performance. For every data item in \(\mathbb {F}_{2^s}\) the parties store its MAC as well, which leads to an overhead of 2 in the memory size. To obtain the key of an input wire of the next circuit each party needs to broadcast its BMR key, which is of size \(\kappa \). Since for every read of data item each party inputs 4 \(\mathbb {F}_{2^s}\) elements (assuming that the global key \(\alpha \) is part of the state), the communication complexity is \(4\kappa n\) bits per input bit. The additional circuitry for authentication (and verification) is of size \(2s^2+2s-1\) (for two multiplications of elements from \(\mathbb {F}_{2^s}\) and additional zero testing), note in Table 2 the required number of multiplications triples is multiplied by n since each AND gate manipulates keys vectors of n coordinates. To obtain the keys for the next CPU-step circuit 2 communication rounds are required, one to broadcast the external value of the input wire (which is done by the party whose input is associated with) and the other is to broadcast the appropriate keys by all parties. Note that we cannot save a communication round by broadcasting the external value when writing (rather than when reading) since the external value that the parties broadcast depends on the input wire of the circuit that is going to read it in a later, unknown point in time.

4.2 Memory via Wire Soldering

The General Technique. Wire soldering allows the parties to reuse an output wire key of one gate as an input wire key of another gate even if these two gates were not meant to be connected in garbling time. The notion of wire soldering for secure computation was introduced in [32] for the two-party settings and implemented using an additively homomorphic commitment scheme \(({\mathsf {com}},{\mathsf {dec}})\), that is, \({\mathsf {com}}(a)+{\mathsf {com}}(b)={\mathsf {com}}(a+b)\). Let u and v be wires with keys \(k_{u,0},k_{u,1},k_{v,0},k_{v,1}\) and permutation bits \(\lambda _u,\lambda _v\). By soldering v to u we would like to achieve the following feature: Obtaining key \(k_{u,\varLambda _u}\) that carries a real value \(\rho _u=\lambda _u\oplus \varLambda _u\), enables to obtain the key \(k_{v,\varLambda _v}\), which carries the same real value \(\rho _v=\lambda _v\oplus \varLambda _v=\rho _u\). It follows that if \(\lambda _u=\lambda _v\) the soldering information reveals \(k_{v,\varLambda _u}\) (i.e. \(\varLambda _u=\varLambda _v\)), otherwise, if \(\varLambda _u \ne \varLambda _v\), it reveals \(k_{v,(1-\varLambda _u)}\).

In a circuit-based 2PC (with garbler and evaluator) this is done by having the garbler send the commitments \({\mathsf {com}}(k_{u,0})\), \({\mathsf {com}}(k_{u,1})\), \({\mathsf {com}}(k_{v,0})\), \({\mathsf {com}}(k_{v,1})\); if \(\lambda _u=\lambda _v\) the garbler also sends the decommitments \(s_0={\mathsf {dec}}(k_{u,0}\oplus k_{v,0})\) and \(s_1={\mathsf {dec}}(k_{u,1}\oplus k_{v,1})\), otherwise (if \(\lambda _u \ne \lambda _v\)) the garbler sends \(s_0={\mathsf {dec}}(k_{u,0}\oplus k_{v,1})\) and \(s_1={\mathsf {dec}}(k_{u,1}\oplus k_{v,0})\). Given the key \(k_{u,\varLambda _u}\), the evaluator computes \(k_{u,\varLambda _u}\oplus s_{\varLambda _u}\) to obtain the correct key for wire \(k_{v,\varLambda _v}\). To prove that the garbler hasn’t inverted the truth value of the wires by choosing the wrong case above, it must also decommit to the XOR of the permutation bits \((\lambda _u\oplus \lambda _v\)). Note that the evaluator learns whether \(\varLambda _u=\varLambda _v\) and thus also learns whether \(\lambda _u=\lambda _v\), however, that doesn’t reveal anything about the real value \(\rho _u=\rho _v\) that is carried over those wires.

Soldering in Our Scheme. When \(\mathcal {P}_\mathsf {BMR}\) (Fig. 5) uses SPDZ to garble the circuit party \(p_i\) not only obtains its own keys \(k_{w,b}^i\) in the clear, but also obtains a SPDZ sharing for both a whole keys \(k_{w,b}\) and the permutation bits \(\lambda _w\) for every \(w\in W\) and \(b\in \{0,1\}\), thus, the parties could use \(\mathcal {F}_{\mathsf {MPC}}\) to perform arithmetic operations over them.

Let u and v be wires with keys \(k_{u,0},k_{u,1}=k_{u,0}\oplus \varDelta ^u\) and \(k_{v,0},k_{v,1}=k_{v,0}\oplus \varDelta ^v\) and permutation bits \(\lambda _u,\lambda _v\in \{0,1\}\). The parties perform the procedure \(\mathsf {Solder(u,v)}\) defined as follows: If \(\varLambda _u=0\) the parties collaboratively compute

$$\begin{aligned} s_{u \rightarrow v}^0 = \bigg ( (\lambda _u\oplus (1-\lambda _v))\cdot (k_{u,0}\oplus k_{v,0}) \bigg ) \oplus \bigg ( (\lambda _u\oplus \lambda _v)\cdot (k_{u,0}\oplus k_{v,1}) \bigg ) \end{aligned}$$

and output \(s_{u \rightarrow v}^0\) to everyone.

Otherwise, if \(\varLambda _u=1\) the parties collaboratively compute

$$\begin{aligned} s_{u \rightarrow v}^1 = \bigg ( (\lambda _u\oplus (1-\lambda _v))\cdot (k_{u,1}\oplus k_{v,1}) \bigg ) \oplus \bigg ( (\lambda _u\oplus \lambda _v)\cdot (k_{u,1}\oplus k_{v,0}) \bigg ) \end{aligned}$$

and output \(s_{u \rightarrow v}^1\) to everyone.

The information \(s_{u \rightarrow v}^{\varLambda _u}\) allows the parties to solder wire v to wire u. Notice that this technique involves only one multiplication layer, since the parties simultaneously compute both multiplications and then locally add results in \(\mathbb {F}_{2^\kappa }\).

Observe that our variation of the soldering is applicable to the multiparty settings as well, in addition, due to the already exist SPDZ shares to the full wires’ keys, we don’t need to rely on additional homomorphic commitment scheme and its expensive overhead. Moreover, the original soldering was thought to be a way to connect two wires within the same circuit (using a single global difference) while in here we show that it is applicable for wires of different circuits as well (that were garbled independently and with two global differences \(\varDelta ^u, \varDelta ^v\)).

To see why it works, without loss of generality, consider \(\varLambda _u=0\). If \(\lambda _u=\lambda _v\), then given key \(k_{u,0}\) that carries value \(\rho _u=\lambda _u\), the parties compute

$$\begin{aligned} k_{u,0}\oplus s_{u \rightarrow v}^0= & {} k_{u,0} \oplus \big ( 1 \cdot (k_{u,0}\oplus k_{v,1} ) \big ) \oplus \big ( 0\cdot (k_{u,0}\oplus k_{v,0} ) \big ) \\= & {} k_{v,0} \end{aligned}$$

such that \(k_{u,0}\) and \(k_{v,0}\) encapsulate the same real value as required. If \(\lambda _u\ne \lambda _v\) we get

$$\begin{aligned} k_{u,0}\oplus s_{u \rightarrow v}^0= & {} k_{u,0} \oplus \big ( 0 \cdot (k_{u,0}\oplus k_{v,0} ) \big ) \oplus \big ( 1 \cdot (k_{u,0}\oplus k_{v,1} ) \big ) \\= & {} k_{v,1} \end{aligned}$$

such that \(k_{u,0}\) and \(k_{v,1}\) encapsulate the same real value as required. The same analysis holds when \(\varLambda _u=1\).

Performance. To obtain the key of the next circuit the parties simultaneously compute 2 multiplications (of n keys) over the shares in one round and then open the result in the second round, hence 2n multiplication triples are required in the offline phase. Multiplication requires the communication of \(3n\kappa \) bits per party, opening requires \(\kappa \), a total of \((3n+1)\kappa \) per party per input bit.

4.3 Memory via Free Conversion Between Keys and Shared Real Values

In this section we present a new technique, which outperforms both the embedding and soldering techniques in both the communication and memory size overheads. Essentially, it allows to freely convert between BMR wire keys and SPDZ secret shares of the real values that those keys represent. As before, it is not necessary to know which SPDZ share to convert from and to in garbling time. This allows reactive memory accesses in the sense that, during evaluation, the parties can evaluate previously garbled circuits on values read from memory at an address that was only just revealed during the evaluation phase. The latter is crucial for implementing ORAM.

Using this technique the parties need to compute only (local) additions and some SPDZ openings in order to move from the evaluation of one circuit to the next. In more detail, converting from wire keys to SPDZ shares can be done without communication at all, while the other direction requires two rounds of SPDZ opening. In any case, no multiplication is necessary, hence, no offline overhead (for triple generation) is implied. Similarly, the information required is a by-product of the BMR offline phase, hence there is no extra cost there.

When reading a bit from memory the parties need to know the external value of the wire associated with it. For a circuit-input-wire, which is associated with a particular party, that party knows the wire’s permutation and external value, hence, it can broadcast the external value to the parties, who then can broadcast their appropriate share of the BMR key. In contrast, when reading from memory, the external value of the wire is shared (it is nobody’s input) and reconstructed, then the parties broadcast their keys as before. The keys that the parties broadcast are stored by each party along with the garbled circuit, in the same streaming manner, and are not part of the program’s memory.

Packing secret bits. Naively storing each external value as a SPDZ secret share would require \(s^2\) bits in memory for every s-bit data block. We can reduce this overhead by packing s secret shares of bits into an s-bit secret share such that it requires only 2s bits in memory for every s-bit data block (s bits for the share itself and another s bits for its MAC). Packing s bits \(\llbracket b_0\rrbracket ,\ldots ,\llbracket b_{s-1}\rrbracket \) is done by computing \(\llbracket B\rrbracket =\sum _{i\in [s]} b_i\cdot 2^i\) where the \(2^i\) part is constant, so we obtain \(\llbracket B\rrbracket \) by local computation only. Now, we can make operations over bits easily by inputting the entire data item and using the specific required bit. “Extracting” the j-th bit, \(\llbracket b_j\rrbracket \), from \(\llbracket B\rrbracket \) can be done locally as well as described below.

Writing to Memory. Recall that after issuing the Garble instruction in \(\mathcal {F}_\mathsf {BMR}\) the parties hold shares to all the permutation bits of all wires. Recall that the wires groups \(W^t_{\mathsf {addr, wr}}\) and \(W_\mathsf {write}^t\) refer to the wires associated with the address to be written and the value to be written to that address respectively. In the protocol, the parties open the permutation bits for wires \(W^t_{\mathsf {addr, wr}}\) but not for wires \(W_\mathsf {write}^t\), this means that they learn \(i_{t}^{\mathsf {write} }\) in the clear, but learn nothing about \(b_{t}^{\mathsf {write} }\), rather, they only obtain the keys and their external values associated with it. That is, for \(w\in W_\mathsf {write}^t\), the parties obtain \(k_{w,\varLambda _w}\) and \(\varLambda _w\). To store the real value that is carried by wire w in memory address i the parties only need to compute \(\llbracket \rho _w\rrbracket =\varLambda _w+\llbracket \lambda _w\rrbracket \). Then \(p_i\) stores \(D[i]\leftarrow \llbracket \rho _w\rrbracket ^i\). Furthermore, every party \(p_i\) can check whether \(k_{w,\varLambda _w}\) is correct because they have obtained both \(k_{w,0}\) and \(k_{w,1} = k_{w,0} \oplus \varDelta _i\) during Garble. This is equivalent to checking the correctness of output wires. In order to achieve optimal memory usage, s bits \(w_0, \dots , w_{s-1}\) can be combined by (locally) computing \(\sum _{i=0}^{s-1} X^i \llbracket \rho _{w_i}\rrbracket \) where X denotes a generator of the multiplicative group of a field of size \(2^s\).

Reading from Memory. Let \(i_{t-1}^{\mathsf {read} }\) be the address from which the parties are instructed to read when evaluating \(GC^{t-1}\) and let \(w\in W_\mathsf {read}^t\). We assume for a moment that secret shares packing technique above has not been applied when storing. Therefore, \(D[i_{t-1}^{\mathsf {read} }]\) contains a share of the bit \(\llbracket \rho _{w'}\rrbracket \) that was most recently written to \(i_{t-1}^{\mathsf {read} }\) at a previous timestep \(t'\) with wire \(w'\in W_\mathsf {write}^{t'}\). Party \(p_i\) holds both \(k_{w,0}^i\) and \(k_{w,1}^i\), but need to broadcast only one of them. Specifically, broadcast \(k_{w,\varLambda _w}^i\) for \(\varLambda _w=\rho _w\oplus \lambda _w\). Now, since we require that \(\rho _w=\rho _{w'}\) then the parties open \(\varLambda _w=\llbracket \rho _w\rrbracket +\llbracket \lambda _w\rrbracket =\llbracket \rho _{w'}\rrbracket +\llbracket \lambda _w\rrbracket \) and broadcast \(k_{w,\varLambda _w}\). Finally, if the parties have stored \(\sum _{i=0}^{s-1} X^i \llbracket \rho _{w_i}\rrbracket \) at a particular memory address, \(\llbracket \rho _{w_i}\rrbracket \) can be computed by opening \(\sum _{i=0}^{s-1} X^i \llbracket \rho _{w_i}\rrbracket + \sum _{i=0}^{s-1} X^i \llbracket \lambda _{w_i}\rrbracket \). This works because any field of size \(2^s\) has characteristic two, thus addition corresponds to bitwise XOR.

5 Realizing Functionality \(\mathcal {F}_\mathsf {UAP}\)

Protocol \(\mathcal {P}_{\mathsf {UAP}}\) in Fig. 7 realizes \(\mathcal {F}_\mathsf {UAP}\) in the \(\mathcal {F}_\mathsf {BMR}\)-hybrid model.

Fig. 7.
figure 7

Realizing \(\mathcal {F}_\mathsf {UAP}\) in the \(\mathcal {F}_\mathsf {BMR}\)-hybrid model.

5.1 Security of Protocol \(\mathcal {P}_\mathsf {UAP}\)

The security of our construction relies on the security of the underlying BMR and SPDZ protocols and the security of the transformation between the garbled wires and SPDZ shares. Informally, the latter can be seen as follows: Neither transformation reveals any secret information because one direction (writing to memory) is done locally, and the other one (reading form memory) only reveals an external value and the corresponding wire label, both of which hide the real value that is carried over the wire according to the security of the BMR protocol. For malicious security, consider that revealing the external value is done using SPDZ, which guarantees correctness by checking the MAC. Furthermore, if any party broadcasts a faulty share of the BMR key, this is guaranteed to lead to an invalid output key (and thus easy detection by honest parties) by the properties of the BMR protocol. More formally, we prove the following theorem:

Theorem 5.1

Protocol \(\mathcal {P}_\mathsf {UAP}\) (Fig. 7) realizes functionality \(\mathcal {F}_\mathsf {UAP}\) (Fig. 2) in the \(\mathcal {F}_\mathsf {BMR}\)-hybrid model.

Proof

Let \(\mathcal {A}\) be an adversary controlling a subset of the parties, denoted \(A=\{p_{i_1},\ldots ,p_{i_c}\}\) and denote by \(\bar{A}=[n]\smallsetminus A\) the subset of the honest parties.

We present a simulator \(\mathcal {S}\) who participates in the ideal execution \(\mathcal {F}_\mathsf {UAP}\) by taking the role of A and in an internal execution of \(\mathcal {P}_\mathsf {UAP}\) with \(\mathcal {A}\), in which \(\mathcal {S}\) takes the role of \(\bar{A}\) and the functionality \(\mathcal {F}_\mathsf {BMR}\). The simulator \(\mathcal {S}\) uses another simulator \(\mathcal {S}_\mathsf {BMR}\) that when given the adversary’s input/output to/from a circuit, and both keys to all wires in the garbled circuit, produces a view that is indistinguishable to the view of the adversary’s evaluation of the circuit in the real execution (such a simulator was presented in [25]).

The simulator \(\mathcal {S}\) does as follows:

  1. 1.

    Extract \(\mathcal {A}\) ’s inputs

    1. (a)

      In the internal execution, garble T copies of \(\mathrm {C}_{\scriptscriptstyle \mathrm {CPU}}\) exactly as described in \(\mathcal {F}_\mathsf {BMR}\). In particular, for every input wire \(w\in GC^1\) associated with a corrupted party \(p_c\in A\) output \(\lambda _w\) in the clear to \(p_c\).

    2. (b)

      Upon issuing the Input command in the internal execution, for an input wire w associated with a corrupted party \(p_c\in A\), receive \(p_c\)’s external value \(\varLambda _w\) and compute \(p_c\)’s input to wire w by \(\rho _w=\varLambda _w\oplus \lambda _w\) (the simulator \(\mathcal {S}\) knows \(\lambda _w\) because it was garbling the circuit on behalf of \(\mathcal {F}_\mathsf {BMR}\)).

  2. 2.

    Engage in the ideal execution \(\mathcal {F}_\mathsf {UAP}\) by inputting the values extracted above as the corrupted parties’ input and obtain \(\varvec{y}=\varPi (D,\varvec{x})\) along with \(\mathsf {AP}(\varPi ,D,\varvec{x})\).

  3. 3.

    Open all garbled circuits \(GC^1,\ldots ,GC^T\) toward the adversary.

  4. 4.

    Evaluation

    1. (a)

      Invoke \(\mathcal {S}_\mathsf {BMR}\) with \(\mathcal {A}\)’s inputs that were extracted earlier, the garbled circuit \(GC^1\) and the adversary’s output from \(GC^1\): \(\mathsf {access}_1=(i_{1}^{\mathsf {read} },i_{1}^{\mathsf {write} })\).Footnote 6 Output whatever \(\mathcal {S}_\mathsf {BMR}\) produced.

    2. (b)

      Note that A have no inputs to circuits \(GC^2,\ldots ,GC^T\), thus, for every such circuit we invoke \(\mathcal {S}_\mathsf {BMR}\) with no inputs at all. Then, for every input wire of the garbled circuit, \(\mathcal {S}\) checks which of its keys \(k_{w,0}\) or \(k_{w,1}\) was produced as the simulated view, and then supply the correct share of the external value of that key (\(\mathcal {S}\) can do this since it knows the global MAC key used for the SPDZ shares). Formally: For \(t=2\) to T:

      1. i.

        Output \(\mathsf {access}_{t-1}={i_{t-1}^{\mathsf {read} },i_{t-1}^{\mathsf {write} }}\) in the clear.

      2. ii.

        Invoke \(\mathcal {S}_\mathsf {BMR}\) with \(GC^t\). If \(t=T\) then supply \(\mathcal {S}_\mathsf {BMR}\) with \(\varvec{y}\) as well.

      3. iii.

        For every input wire w of \(GC^t\), extract from the produced view the key \(k_{w,\varLambda _w}\) used in the evaluation.

      4. iv.

        For every input wire w of \(GC^t\) simulate the opening of \(\llbracket \varLambda _w\rrbracket =\llbracket \lambda _w\oplus \rho _w\rrbracket \) (from Step 3 in \(\mathcal {P}_\mathsf {UAP}\)) as follows: Let the shares of A be \(\varLambda _w^A\), then \(\mathcal {S}\) chooses random shares \(\varLambda _w^h\) for every honest party \(p_h\in \bar{A}\) such that \(\varLambda _w=\varLambda _w^A+\sum _{h\in \bar{A}} \varLambda _w^h\) and use them to open \(\varLambda _w\).

      5. v.

        Output the view produced by \(\mathcal {S}_\mathsf {BMR}\) in Step 4(b)ii above.

Claim

For every PPT \(\mathcal {A}\) and for every \(\varvec{x}\) the output of \(\mathcal {S}\) above is indistinguishable to the view of \(\mathcal {A}\) in the real execution of \(\mathcal {P}_\mathsf {UAP}\).

Proof

We define hybrid \(\mathbf{Hyb}_t\) to be as follows: The adversary view in the real execution of \(\mathcal {P}_\mathsf {UAP}\) for timesteps \(1,\ldots ,t\), followed by the simulated view for timesteps \(t+1,\ldots ,T\) as described in Step 4b of the simulation above. Thus, \(\mathbf{Hyb}_T\) is exactly the real execution of \(\mathcal {P}_\mathsf {UAP}\) and \(\mathbf{Hyb}_0\) is exactly the output of \(\mathcal {S}\) described above. Assume by contradiction that there exists a PPT \(\mathcal {D}\) who can distinguish between \(\mathbf{Hyb}_T\) and \(\mathbf{Hyb}_0\) with non negligible probability, then we construct \(\mathcal {D}'\) who distinguishes between a real execution of \(\mathcal {P}_\mathsf {BMR}\) to the output of \(\mathcal {S}_\mathsf {BMR}\) (for a single circuit) as follows: By the existence of \(\mathcal {D}\) it is implied that there exists \(t'\) for which \(\mathcal {D}\) distinguishes between \(\mathbf{Hyb}_t\) and \(\mathbf{Hyb}_{t+1}\) with non negligible probability. Then, given a circuit C and a view V which is either the view of the adversary in a real execution of \(\mathcal {P}_\mathsf {BMR}\) or the output of \(\mathcal {S}_\mathsf {BMR}\), \(\mathcal {D}'\) generates a real view of the execution of \(\mathcal {P}_\mathsf {UAP}\) for timesteps \(1,\ldots ,t\), then plugs V together with the opening of external values of the input wires of C, and then complete the simulation according to Step 4b above. Finally, hands the result view to \(\mathcal {D}\) and outputs whatever \(\mathcal {D}\) outputs. Observe that if V is a view of a real execution then the above is distributed exactly as \(\mathbf{Hyb}_t\), otherwise, it is distributed exactly as \(\mathbf{Hyb}_{t+1}\). It follows that \(\mathcal {D}'\) distinguishes between the real execution of \(\mathcal {P}_\mathsf {BMR}\) and the output of \(\mathcal {S}_\mathsf {BMR}\) with non negligible probability, by contradiction of the security of \(\mathcal {P}_\mathsf {BMR}\).

6 Optimizing BMR Evaluation

The free-XOR technique of [2] makes space and communication complexities linear in the number of AND gates (XOR gates are almost for freeFootnote 7). In this section we show how to further decrease memory consumption in the online phase by a factor of up to 2. Even though our technique could be applied to a plain BMR protocol, we present the idea over a scheme that uses the free-XOR. We stress that it is not limited to secure RAM computation but also applicable in BMR-based protocols, even with only a single execution.

The \(\mathsf {Evaluate}\) instruction in \(\mathcal {F}_\mathsf {BMR}\) (Fig. 3) that is invoked in the online phase traverses the circuit in a topological order and obtains a single output key \(k_w\in (\mathbb {F}_{2^\kappa })^n\) for every wire w in the circuit, until it reaches the output wires. To check the authenticity of \(k_w\), party \(p_i\) extracts the ith element, \(k_w^i\in \mathbb {F}_{2^\kappa }\), and verifies that it is one of the keys given to him by \(\mathcal {F}_{\mathsf {MPC}}\) in the offline phase, that is, \(k_w^i\in \{ k_{w,0}^i, k_{w_1}^i \}\). In case that \(k_w^i\not \in \{ k_w^0, k_w^1 \}\) then \(p_i\) notifies all parties with regard to the corrupted garbled circuit and aborts. Using our technique, it is possible for \(p_i\) to discard its keys \(\{ k_{w,0}^i, k_{w_1}^i \}\) of all wires right after the garbled circuit construction is complete (in the offline phase), instead, it has to store only a single bit per wire. Since the garbled gate is of size 4nk and the original verification procedure requires memory of size 2k (i.e. party \(p_i\) stores the two keys of the output wire of the gate), this results with a decrease of memory consumption by a factor of \(\frac{1}{2n}\). However, a great improvement is achieved for a more recent construction [3]. In that construction the size of a garbled gate is 4k (i.e. it is independent of the number of parties n), thus, memory saving is significant.

Using our technique, the evaluator is saved from loading and comparing 1.5 keys per wire in average (since in half of the wires the verification passes after the first comparison). This loadingFootnote 8 and comparison time became substantial as the computation of AES has been considerably improvedFootnote 9.

6.1 The Technique

Circuit garbling is done in the offline phase of the protocol using the \(\mathcal {F}_{\mathsf {MPC}}\) functionality (Fig. 11). Let \(\mathsf {lsb}(x)\) denote the least significant bit of x. We instruct \(\mathcal {F}_{\mathsf {MPC}}\) to choose \(\varDelta =(\varDelta _1,\ldots ,\varDelta _n)\) such that \(\mathsf {lsb}(\varDelta _i)=1\) for every \(i\in [n]\). The result is that \(\mathsf {lsb}(k_{w,0}^i) \ne \mathsf {lsb}(k_{w,1}^i)\) for all w and i. When garbling is completed using \(\varDelta \) as described, party \(p_i\) stores the bit \(\delta _w^i = \mathsf {lsb}(k_{w,0}^i)\) for every wire w. In addition party \(p_i\) discards all keys \(k_{w,0}^i\) and \(k_{w,1}^i\) for all but the output wires.

The evaluation of the circuit is done exactly as before, however, instead of verifying the key validity of the output wire of every gate, this is done only for output gates. For an inner gate with output wire w, party \(p_i\) obtains the external value \(\varLambda _w\) by computing \(\varLambda _w=\mathsf {lsb}(k_w^i) \oplus \delta _w^i\). This way the parties learn that the key \(k_w^i\) obtained by evaluating a gate is actually the \(\varLambda \)-key. For output gates (i.e. gates whose output wire is also a circuit-output wire), party \(p_i\) verifies that \(k_w^i \in \{k_{w,0}^i,k_{w,1}^i\}\) as before.

Forcing the last bit of a random element is featured in SPDZ-like implementation of \(\mathcal {F}_{\mathsf {MPC}}\) (e.g. [23]) since they are inherently bit wise, so we can generate \(k-1\) random bits and then compose the field element accordingly so its last bit is 1.

6.2 Security

Notice that we use the exact same garbling procedure as in [2] except that here the last bit of every \(\varDelta _i\) is known to the adversary (i.e. \(\mathsf {lsb}(\varDelta _i)=1\)) whereas in their scheme all bits of \(\varDelta \) are random. The security of our scheme can be easily reduced to the security of [2]. Our simulator is the same simulator as in [2]. Let the distinguisher’s advantage in distinguishing between the real execution of our scheme to the ideal execution be \(\epsilon \). Then, the advantage of the same distinguisher in distinguishing between the real execution and the simulation of [2] is \(\epsilon ' = \epsilon \cdot \frac{1}{2^h}\) for h honest parties. This holds because the probability of having \(\mathsf {lsb}(\varDelta _i)=1\) in the free-XOR scheme is \(\frac{1}{2}\) for an honest party \(p_i\). Recall that in the original scheme, the security depends on h keys of length k. Thus, increasing the advantage of the adversary by \(2^h\) is negligible. Assuming that [2] is secure we conclude that our scheme is secure as well.

7 Implementation

In this section we report our results of the first (to the best of our knowledge) implementation of a garbled-circuit-based secure RAM computation for setting with active security and dishonest majority. We chose to implement our third technique (Sect. 4.3) as it is the most efficient technique for memory access. We have combined our new BMR implementation with the existing SPDZ system [6], and used it to implement an oblivious arrayFootnote 10 using Circuit ORAM [36]. The code is written in C++ using the AES-NI and AVX2 instruction sets.

Experiments. Our timing results below refer to the following experiments:

  1. 1.

    Circuit ORAM [36] using the BMR-SPDZ protocol with the scheme in Sect. 4.3, labeled as ‘BMR, Circuit ORAM’ in the figures below.

  2. 2.

    Circuit ORAM [36] using a pure SPDZ implementation, labeled as ‘Pure SPDZ, Circuit ORAM’.

  3. 3.

    Path ORAM [35] using a pure SPDZ implementation [24], labeled as ‘Pure SPDZ, Path ORAM’.

  4. 4.

    Trivial ORAM, i.e. linear scanning of the entire memory for every access, labeled as ‘BMR, linear scan’.

The Path ORAM intends to optimize the bandwidth cost and bandwidth blowup where bandwidth cost refers to the average number of bits transferred for accessing a single block and bandwidth blowup is defined as bandwidth cost divided by the block size (i.e., the bit-length of a data block)Footnote 11. The results by Keller and Scholl [24] are reported using Path ORAM, which seems preferable when round complexity is not a concern. For the sake of comparison, we have also implemented Circuit ORAM using pure SPDZ. Comparing experiments (1), (2) and (3) in Figs. 8 and 9, our approach outperforms the pure SPDZ when the parties are connected over a WAN, independently of the choice of the ORAM scheme. Furthermore, experiment (4) allows to find the breakeven points, that is, to figure out up to what memory size the linear scan performs better than applying an ORAM algorithm. Given the simplicity of a linear scan, it is clear that it is faster for small enough sizes.

All experiments were performed for both LAN and WAN environment to test the influence of our approach of reducing the round complexity. We stress that our implementation is the first in this setting even when considering 2 parties only. Nevertheless, we report timing results for a protocol with 3 participants as well.

Parameters. Our security parameters are \(\kappa =128\) and \(s=40\). In all experiments, the oblivious arrays are made up of 32-bit entries, and all figures refer to the array size as the number of such entries. Therefore, our figures range from \(1024 \cdot 32 \approx 32\) kB to \(2^{25} \cdot 32 \approx 1.1\) GB.

Our ORAM implementations (Circuit ORAM and Path ORAM) require up to three recursions such that intermediate ORAMs use 128-bit entries, and we use a linear scan for less than 256 such entries.

All reported results are measured per logical access to the memory (array), which, as explained before, may incorporate many physical accesses.

Environment. Our implementations were done using 4th generation Intel Core i7 with 8 cores running at 3.5 GHz, 16 GB RAM, and SSD (to store the garbled circuits) connected over a LAN (bandwidth of 1 Gbit/s and RTT of 0.1–0.2 ms).

Furthermore, we have simulated a WAN setting on the same machines by extending the round trip time to 100 ms and restricting the throughput to 50 Mbit/s. Figure 8 shows our results for the two settings with two parties while Fig. 9 shows our results for three parties. They confirm that using garbled circuits (BMR) is beneficial with high network latencies. With BMR, combining Circuit ORAM with our memory access surpasses linear scanning below a size of one million.

Fig. 8.
figure 8

Two parties over LAN (left) and over WAN (right).

Fig. 9.
figure 9

Three parties over LAN (left) and over WAN (right).

Fig. 10.
figure 10

Offline time per logical access with Circuit ORAM and BMR.

Offline Cost. Finally, for a more complete picture, we have estimated the offline cost in the LAN setting. Figure 10 shows the cost for one access of Circuit ORAM implemented in BMR. All figures are based on the number of AND gates in the circuit computing Circuit ORAM because the preprocessing information required for soldering is essentially a by-product of the circuit generation.

To get a better picture of the offline performance of our protocol, we separated it into three parts:

  • Offline-SPDZ. This is the offline phase of the SPDZ protocol, which is independent of the circuit the parties wish to evaluate. In this phase the parties produce the multiplication triples that would be required for the garbling. The numbers in this part are based on a production of 4828 triples per second as reported by Keller et al. [23].

  • Local-AES. Local computation of AES ciphers. The parties use the results of that computation as input to the Online-SPDZ part, which use them in order to construct the garbled circuit.

  • Online-SPDZ. This is the online phase of the SPDZ protocol, in which the parties evaluate a circuit that garbles the actual circuit they want to evaluate in the BMR online phase.

In the figure we can easily observe that Offline-SPDZ dominates the cost by 3–4 orders of magnitudes because of the communication cost of MASCOT [23].